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Preface for Students 


You are probably about to begin your second exposure to linear algebra. Unlike 
your first brush with the subject, which probably emphasized Euclidean spaces 
and matrices, this encounter will focus on abstract vector spaces and linear maps. 
These terms will be defined later, so don’t worry if you do not know what they 
mean. This book starts from the beginning of the subject, assuming no knowledge 
of linear algebra. The key point is that you are about to immerse yourself in 
serious mathematics, with an emphasis on attaining a deep understanding of the 
definitions, theorems, and proofs. 

You cannot read mathematics the way you read a novel. If you zip through a 
page in less than an hour, you are probably going too fast. When you encounter 
the phrase “as you should verify’, you should indeed do the verification, which 
will usually require some writing on your part. When steps are left out, you need 
to supply the missing pieces. You should ponder and internalize each definition. 
For each theorem, you should seek examples to show why each hypothesis is 
necessary. Discussions with other students should help. 

As a visual aid, definitions are in yellow boxes and theorems are in blue boxes 
(in color versions of the book). Each theorem has an infomal descriptive name. 

Please check the website below for additional information about the book, 
including a link to videos that are freely available to accompany the book. 

Your suggestions, comments, and corrections are most welcome. 

Best wishes for success and enjoyment in learning linear algebra! 


Sheldon Axler 
San Francisco State University 


website: https://linear.axler.net 
e-mail: linear@axler.net 
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Preface for Instructors 


You are about to teach a course that will probably give students their second 
exposure to linear algebra. During their first brush with the subject, your students 
probably worked with Euclidean spaces and matrices. In contrast, this course will 
emphasize abstract vector spaces and linear maps. 

The title of this book deserves an explanation. Most linear algebra textbooks 
use determinants to prove that every linear operator on a finite-dimensional com- 
plex vector space has an eigenvalue. Determinants are difficult, nonintuitive, 
and often defined without motivation. To prove the theorem about existence of 
eigenvalues on complex vector spaces, most books must define determinants, 
prove that a linear operator is not invertible if and only if its determinant equals 0, 
and then define the characteristic polynomial. This tortuous (torturous?) path 
gives students little feeling for why eigenvalues exist. 

In contrast, the simple determinant-free proofs presented here (for example, 
see 5.19) offer more insight. Once determinants have been moved to the end of 
the book, a new route opens to the main goal of linear algebra—understanding 
the structure of linear operators. 

This book starts at the beginning of the subject, with no prerequisites other 
than the usual demand for suitable mathematical maturity. A few examples 
and exercises involve calculus concepts such as continuity, differentiation, and 
integration. You can easily skip those examples and exercises if your students 
have not had calculus. If your students have had calculus, then those examples and 
exercises can enrich their experience by showing connections between different 
parts of mathematics. 

Even if your students have already seen some of the material in the first few 
chapters, they may be unaccustomed to working exercises of the type presented 
here, most of which require an understanding of proofs. 

Here is a chapter-by-chapter summary of the highlights of the book: 


e Chapter 1: Vector spaces are defined in this chapter, and their basic properties 
are developed. 


e Chapter 2: Linear independence, span, basis, and dimension are defined in this 
chapter, which presents the basic theory of finite-dimensional vector spaces. 


e Chapter 3: This chapter introduces linear maps. The key result here is the 
fundamental theorem of linear maps: if T is a linear map on V, then dim V = 
dim null T + dim range T. Quotient spaces and duality are topics in this chapter 
at a higher level of abstraction than most of the book; these topics can be 


skipped (except that duality is needed for tensor products in Section 9D). | 
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e Chapter 4: The part of the theory of polynomials that will be needed to un- 
derstand linear operators is presented in this chapter. This chapter contains no 
linear algebra. It can be covered quickly, especially if your students are already 
familiar with these results. 


e Chapter 5: The idea of studying a linear operator by restricting it to small sub- 
spaces leads to eigenvectors in the early part of this chapter. The highlight of this 
chapter is a simple proof that on complex vector spaces, eigenvalues always ex- 
ist. This result is then used to show that each linear operator on a complex vector 
space has an upper-triangular matrix with respect to some basis. The minimal 
polynomial plays an important role here and later in the book. For example, this 
chapter gives a characterization of the diagonalizable operators in terms of the 
minimal polynomial. Section 5E can be skipped if you want to save some time. 


e Chapter 6: Inner product spaces are defined in this chapter, and their basic 
properties are developed along with tools such as orthonormal bases and the 
Gram-Schmidt procedure. This chapter also shows how orthogonal projections 
can be used to solve certain minimization problems. The pseudoinverse is then 
introduced as a useful tool when the inverse does not exist. The material on 
the pseudoinverse can be skipped if you want to save some time. 


e Chapter 7: The spectral theorem, which characterizes the linear operators for 
which there exists an orthonormal basis consisting of eigenvectors, is one of 
the highlights of this book. The work in earlier chapters pays off here with espe- 
cially simple proofs. This chapter also deals with positive operators, isometries, 
unitary operators, matrix factorizations, and especially the singular value de- 
composition, which leads to the polar decomposition and norms of linear maps. 


e Chapter 8: This chapter shows that for each operator on a complex vector space, 
there is a basis of the vector space consisting of generalized eigenvectors of the 
operator. Then the generalized eigenspace decomposition describes a linear 
operator on a complex vector space. The multiplicity of an eigenvalue is defined 
as the dimension of the corresponding generalized eigenspace. These tools are 
used to prove that every invertible linear operator on a complex vector space 
has a square root. Then the chapter gives a proof that every linear operator on 
a complex vector space can be put into Jordan form. The chapter concludes 
with an investigation of the trace of operators. 


e Chapter 9: This chapter begins by looking at bilinear forms and showing that the 
vector space of bilinear forms is the direct sum of the subspaces of symmetric 
bilinear forms and alternating bilinear forms. Then quadratic forms are diag- 
onalized. Moving to multilinear forms, the chapter shows that the subspace of 
alternating n-linear forms on an n-dimensional vector space has dimension one. 
This result leads to a clean basis-free definition of the determinant of an opera- 
tor. For complex vector spaces, the determinant turns out to equal the product of 
the eigenvalues, with each eigenvalue included in the product as many times as 
its multiplicity. The chapter concludes with an introduction to tensor products. 


Preface for Instructors Xv 


This book usually develops linear algebra simultaneously for real and complex 
vector spaces by letting F denote either the real or the complex numbers. If you and 
your students prefer to think of F as an arbitrary field, then see the comments at the 
end of Section 1A. I prefer avoiding arbitrary fields at this level because they intro- 
duce extra abstraction without leading to any new linear algebra. Also, students are 
more comfortable thinking of polynomials as functions instead of the more formal 
objects needed for polynomials with coefficients in finite fields. Finally, even if the 
beginning part of the theory were developed with arbitrary fields, inner product 
spaces would push consideration back to just real and complex vector spaces. 

You probably cannot cover everything in this book in one semester. Going 
through all the material in the first seven or eight chapters during a one-semester 
course may require a rapid pace. If you must reach Chapter 9, then consider 
skipping the material on quotient spaces in Section 3E, skipping Section 3F 
on duality (unless you intend to cover tensor products in Section 9D), covering 
Chapter 4 on polynomials in a half hour, skipping Section 5E on commuting 
operators, and skipping the subsection in Section 6C on the pseudoinverse. 

A goal more important than teaching any particular theorem is to develop in 
students the ability to understand and manipulate the objects of linear algebra. 
Mathematics can be learned only by doing. Fortunately, linear algebra has many 
good homework exercises. When teaching this course, during each class I usually 
assign as homework several of the exercises, due the next class. Going over the 
homework might take up significant time in a typical class. 

Some of the exercises are intended to lead curious students into important 
topics beyond what might usually be included in a basic second course in linear 
algebra. 


The author’s top ten 


Listed below are the author’s ten favorite results in the book, in order of their 
appearance in the book. Students who leave your course with a good understanding 
of these crucial results will have an excellent foundation in linear algebra. 


e any two bases of a vector space have the same length (2.34) 

e fundamental theorem of linear maps (3.21) 

e existence of eigenvalues if F = C (5.19) 

e upper-triangular form always exists if F = C (5.47) 

e Cauchy—Schwarz inequality (6.14) 

e Gram-—Schmidt procedure (6.32) 

e spectral theorem (7.29 and 7.31) 

e singular value decomposition (7.70) 

e generalized eigenspace decomposition theorem when F = C (8.22) 


e dimension of alternating n-linear forms on V is 1 if dim V = n (9.37) 
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Major improvements and additions for the fourth edition 


Over 250 new exercises and over 70 new examples. 


Increasing use of the minimal polynomial to provide cleaner proofs of multiple 
results, including necessary and sufficient conditions for an operator to have an 
upper-triangular matrix with respect to some basis (see Section 5C), necessary 
and sufficient conditions for diagonalizability (see Section 5D), and the real 
spectral theorem (see Section 7B). 


New section on commuting operators (see Section 5E). 
New subsection on pseudoinverse (see Section 6C). 
New subsections on QR factorization/Cholesky factorization (see Section 7D). 


Singular value decomposition now done for linear maps from an inner product 
space to another (possibly different) inner product space, rather than only deal- 
ing with linear operators from an inner product space to itself (see Section 7E). 


Polar decomposition now proved from singular value decomposition, rather than 
in the opposite order; this has led to cleaner proofs of both the singular value 
decomposition (see Section 7E) and the polar decomposition (see Section 7F). 


New subsection on norms of linear maps on finite-dimensional inner prod- 
uct spaces, using the singular value decomposition to avoid even mentioning 
supremum in the definition of the norm of a linear map (see Section 7F). 


New subsection on approximation by linear maps with lower-dimensional range 
(see Section 7F). 


New elementary proof of the important result that if T is an operator on a finite- 
dimensional complex vector space V, then there exists a basis of V consisting 
of generalized eigenvectors of T (see 8.9). 


New Chapter 9 on multilinear algebra, including bilinear forms, quadratic 
forms, multilinear forms, and tensor products. Determinants now are defined 
using a basis-free approach via alternating multilinear forms. 


New formatting to improve the student-friendly appearance of the book. For 
example, the definition and result boxes now have rounded corners instead of 
right-angle corners, for a gentler look. The main font size has been reduced 
from 11 point to 10.5 point. 


Please check the website below for additional links and information about the 


book. Your suggestions, comments, and corrections are most welcome. 


Sheldon Axler 
San Francisco State University 


Best wishes for teaching a successful linear algebra class! 


Contact the author, or Springer if the 
author is not available, for permission 


website: https://linear.axler.net for translations or other commercial 


e-mail: linear@axler.net reuse of the contents of this book. 
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Chapter 1 ete 
Vector Spaces 


Linear algebra is the study of linear maps on finite-dimensional vector spaces. 
Eventually we will learn what all these terms mean. In this chapter we will define 
vector spaces and discuss their elementary properties. 

In linear algebra, better theorems and more insight emerge if complex numbers 
are investigated along with real numbers. Thus we will begin by introducing the 
complex numbers and their basic properties. 

We will generalize the examples of a plane and of ordinary space to R” and 
C”, which we then will generalize to the notion of a vector space. As we will see, 
a vector space is a set with operations of addition and scalar multiplication that 
satisfy natural algebraic properties. 

Then our next topic will be subspaces, which play a role for vector spaces 
analogous to the role played by subsets for sets. Finally, we will look at sums 
of subspaces (analogous to unions of subsets) and direct sums of subspaces 
(analogous to unions of disjoint sets). 


Bieqss04 S|IN ‘luseUNG sINno7 aula! 


& IN 4 
René Descartes explaining his work to Queen Christina of Sweden. 
Vector spaces are a generalization of the description of a plane 
using two coordinates, as published by Descartes in 1637. 
© Sheldon Axler 2024 


S. Axler, Linear Algebra Done Right, Undergraduate Texts in Mathematics, 
https://doi.org/10.1007/978-3-031-41026-0_1 


2 Chapter 1 Vector Spaces 
IA_ R" andC" 


Complex Numbers 


You should already be familiar with basic properties of the set R of real numbers. 
Complex numbers were invented so that we can take square roots of negative 
numbers. The idea is to assume we have a square root of —1, denoted by i, that 
obeys the usual rules of arithmetic. Here are the formal definitions. 


1.1 definition: complex numbers, C 


e A complex number is an ordered pair (a,b), where a,b € R, but we will 
write this as a + bi. 


e The set of all complex numbers is denoted by C: 


C ={a+bi:a,beER}. 


e Addition and multiplication on C are defined by 


(a+ bi) + (c+ di) = (a+c)+ (b+ d)i, 
(a + bi)(c + di) = (ac — bd) + (ad + be)i; 


here a,b,c,d ER. 


Ifa € R, we identify a + 0i with the real number a. Thus we think of R as a 
subset of C. We usually write 0 + bi as just bi, and we usually write 0 + 1i as just i. 
To motivate the definition of complex yy, symbol i was first used to denote 


multiplication given above, pretend that = \/—7 by Leonhard Euler in 1777. 
we knew that i = —1 and then use the 


usual rules of arithmetic to derive the formula above for the product of two 
complex numbers. Then use that formula to verify that we indeed have 


i2 = —-1. 


Do not memorize the formula for the product of two complex numbers—you 
can always rederive it by recalling that i? = —1 and then using the usual rules of 
arithmetic (as given by 1.3). The next example illustrates this procedure. 


1.2 example: complex arithmetic 


The product (2 + 3i) (4 + 5i) can be evaluated by applying the distributive and 
commutative properties from 1.3: 
(2 + 31)(4+ 5i) =2- (4+ 5i) + (37) (4 + 5?) 
=2-44+2.-514+ 31-44 (31) (51) 
=8+10i+12i-15 
= —7 + 22i. 


Section 1A R” andC” 3 


Our first result states that complex addition and complex multiplication have 
the familiar properties that we expect. 


1.3. properties of complex arithmetic 


commutativity 
a+6=6+aand af = Ba for alla, B EC. 


associativity 
(a+ B)+A=a+ (B+ A) and (aB)A = a(BA) for all a, 6, A € C. 


identities 
A+0=Aand Al =A forallA EC. 


additive inverse 
For every a € C, there exists a unique 6 € C such that a + 6 = 0. 


multiplicative inverse 
For every « € C with « $ 0, there exists a unique 6 € C such that #6 = 1. 


distributive property 
A(a + B) = Aw + Af for all A, a, B € C. 


The properties above are proved using the familiar properties of real numbers 
and the definitions of complex addition and multiplication. The next example 
shows how commutativity of complex multiplication is proved. Proofs of the 
other properties above are left as exercises. 


1.4 example: commutativity of complex multiplication 


To show that «6 = fw for all a, 6B € C, suppose 
a=a+bi and B=c+di, 


where a, b,c,d € R. Then the definition of multiplication of complex numbers 
shows that 


uB = (a+ bi)(c + di) 
= (ac — bd) + (ad + be)i 


and 


Bu = (c + di)(a + bi) 
= (ca — db) + (cb + da)i. 


The equations above and the commutativity of multiplication and addition of real 
numbers show that «6 = fu. 


4 Chapter 1 Vector Spaces 


Next, we define the additive and multiplicative inverses of complex numbers, 
and then use those inverses to define subtraction and division operations with 
complex numbers. 


1.5. definition: —a, subtraction, 1/a, division 


Suppose a, 8 € C. 


e Let —a denote the additive inverse of «. Thus —a is the unique complex 
number such that 
a+(-a) =0. 


e Subtraction on C is defined by 
p-—a= f+ (-«). 


e Fora + 0, let 1/a and = denote the multiplicative inverse of a. Thus 1/a is 
the unique complex number such that 


ra rey — I 


e For a ¢ 0, division by «a is defined by 
B/a = B(1/«). 


So that we can conveniently make definitions and prove theorems that apply 
to both real and complex numbers, we adopt the following notation. 


Thus if we prove a theorem involving 
F, we will know that it holds when F is 
replaced with R and when F is replaced 
with C. 

Elements of F are called scalars. The word “scalar” (which is just a fancy 
word for “number’”’) is often used when we want to emphasize that an object is a 
number, as opposed to a vector (vectors will be defined soon). 

For a € F and m a positive integer, we define a” to denote the product of a 
with itself m times: 


The letter F is used because R and C 
are examples of what are called fields. 


a” = aed, 
eens 
m times 


This definition implies that 
(Qi) = 9" and. (WB) =a" p" 


for all w, 6 € F and all positive integers m, n. 


Section 1A R” andC” 5 


Lists 


Before defining R” and C”, we look at two important examples. 


1.7 example: R* and R° 


e The set R%, which you can think of as a plane, is the set of all ordered pairs of 
real numbers: 
R? = {(x,y) : x,y € R}. 


e The set R® which you can think of as ordinary space, is the set of all ordered 
triples of real numbers: 
R? = {(x,y,z) : x,y,z © R}. 


To generalize R* and R® to higher dimensions, we first need to discuss the 
concept of lists. 


1.8 definition: list, length 


e Suppose n is a nonnegative integer. A list of length n is an ordered collec- 
tion of n elements (which might be numbers, other lists, or more abstract 
objects). 


e Two lists are equal if and only if they have the same length and the same 
elements in the same order. 


Lists are often written as elements 
separated by commas and surrounded by 
parentheses. Thus a list of length two is 
an ordered pair that might be written as (a, b). A list of length three is an ordered 
triple that might be written as (x, y,z). A list of length n might look like this: 


Many mathematicians call a list of 
length n an n-tuple. 


(Zi ss Zhe 

Sometimes we will use the word ist without specifying its length. Remember, 
however, that by definition each list has a finite length that is a nonnegative integer. 
Thus an object that looks like (x1, x, ....), which might be said to have infinite 
length, is not a list. 

A list of length 0 looks like this: ( ). We consider such an object to be a list 
so that some of our theorems will not have trivial exceptions. 

Lists differ from sets in two ways: in lists, order matters and repetitions have 
meaning; in sets, order and repetitions are irrelevant. 


1.9 example: lists versus sets 


e The lists (3,5) and (5,3) are not equal, but the sets {3,5} and {5,3} are equal. 


e The lists (4,4) and (4,4, 4) are not equal (they do not have the same length), 
although the sets {4,4} and {4, 4,4} both equal the set {4}. 
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F" 


To define the higher-dimensional analogues of R* and R°, we will simply replace 
R with F (which equals R or C) and replace the 2 or 3 with an arbitrary positive 
integer. 


F” is the set of all lists of length n of elements of F: 
Ee epee) ee SE tong — lbeee ait 


For (x,,...,X,,) € F” andk € {1,...,1}, we say that x, is the k coordinate of 
(ies oa: 


If F = R and n equals 2 or 3, then the definition above of F” agrees with our 
previous notions of R? and R® 


1.12 example: Ct 


C’ is the set of all lists of four complex numbers: 


4 . 
Cc = {(Z1, Zo, 235 Z4) > 21,295 235Z4 E C}. 


lin 2 4, We eames — R" as Read Flatland: A Romance of Many 
a physical object. Similarly, C’ can be —_ pimensions, by Edwin A. Abbott, for 
thought of as a plane, but for > 2, the gy amusing account of how R? would 


human brain cannot provide a fullimage —_ be perceived by creatures living in R2 
of C”. However, even if 1 is large, we This novel, published in 1884, may 
can perform algebraic manipulations in help you imagine a physical space of 
F” as easily as in R* or R° For example, —_four or more dimensions. 

addition in F” is defined as follows. 


Addition in F” is defined by adding corresponding coordinates: 


(ps cree Xp) + Yas ee Yn) = Lp + Yar ee Xn + Yn): 


Often the mathematics of F” becomes cleaner if we use a single letter to denote 
a list of n numbers, without explicitly writing the coordinates. For example, the 
next result is stated with x and y in F” even though the proof requires the more 
cumbersome notation of (x1, ...,%,,) and (Y1,...,Yy)- 
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1.14 commutativity of addition in EF" 


Ifx,y € F’, thenx+y=y+x. 


Proof Suppose x = (x1,...,X,) € F” and y = (yj,...,y,,) © F”. Then 
KAY = (Ky, eee My) + Yq, Vy) 
= (Xy + Yq, Xp + Yy) 
= (Yy + X45 0 Yn + Xp) 
= (Yyo ee Wy) + (Xy5 0 Uy) 
=yt4X, 


where the second and fourth equalities above hold because of the definition of 
addition in F” and the third equality holds because of the usual commutativity of 
addition in F. 


If a single letter is used to denote an 
element of F”, then the same letter with 
appropriate subscripts is often used when 
coordinates must be displayed. For example, if x € F”, then letting x equal 
(x1, ---,X,,) is good notation, as shown in the proof above. Even better, work with 
just x and avoid explicit coordinates when possible. 


The symbol \ means “end of proof ”. 


1.15 notation: 0 


Let 0 denote the list of length n whose coordinates are all 0: 


= (Oycce OY. 


Here we are using the symbol 0 in two different ways—on the left side of the 
equation above, the symbol 0 denotes a list of length n, which is an element of F”, 
whereas on the right side, each 0 denotes a number. This potentially confusing 
practice actually causes no problems because the context should always make 
clear which 0 is intended. 


1.16 example: context determines which 0 is intended 


Consider the statement that 0 is an additive identity for F”: 
x+0=x forallx e F" 


Here the 0 above is the list defined in 1.15, not the number 0, because we have 
not defined the sum of an element of F” (namely, x) and the number 0. 
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A picture can aid our intuition. We will 
draw pictures in R* because we can sketch (a, b) 
this space on two-dimensional surfaces 
such as paper and computer screens. A 
typical element of R? is a point v = (a,b). 
Sometimes we think of v not as a point | 
but as an arrow starting at the origin and Elements of R* can be thought of 
ending at (a,b), as shown here. When we as points or as vectors. 
think of an element of R? as an arrow, we 
refer to it as a vector. 

When we think of vectors in R? as arrows, we 
can move an arrow parallel to itself (not changing 
its length or direction) and still think of it as the g 
same vector. With that viewpoint, you will often 
gain better understanding by dispensing with the 
coordinate axes and the explicit coordinates and A vector. 
just thinking of the vector, as shown in the figure here. The two arrows shown 
here have the same length and same direction, so we think of them as the same 
vector. 


: . 2 
Whenever we use pictures in R° or Mathematical models of the economy 


use the somewhat vague language of can have thousands of variables, say 
points and vectors, remember that these x, v.559, which means that we must 


are just aids to our understanding, not sub- work in R5°°. Such a space cannot be 
stitutes for the actual mathematics that — dealt with geometrically. However, the 
we will develop. Although we cannot — algebraic approach works well. Thus 
draw good pictures in high-dimensional — our subject is called linear algebra. 
spaces, the elements of these spaces are 

as rigorously defined as elements of R2 

For example, (2, —3,17, 71, v2) is an element of R°, and we may casually 
refer to it as a point in R° or a vector in R° without worrying about whether the 
geometry of R° has any physical meaning. 

Recall that we defined the sum of two elements of F” to be the element of F” 
obtained by adding corresponding coordinates; see 1.13. As we will now see, 
addition has a simple geometric interpretation in the special case of R2 

Suppose we have two vectors u and v in R? v 
that we want to add. Move the vector v parallel 


to itself so that its initial point coincides with the . 

end point of the vector u, as shown here. The ie 

sum u + v then equals the vector whose initial 

point equals the initial point of u and whose end 

point equals the end point of the vector v, as The sum of two vectors. 


shown here. 
In the next definition, the 0 on the right side of the displayed equation is the 
list 0 € F” 
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1.17 definition: additive inverse in F", —x 


For x € F", the additive inverse of x, denoted by —x, is the vector —x € F” 
such that 
x+(-—x) = 0. 


Mihus lex = (ree) ene — a — (ee) 


The additive inverse of a vector in R? is the 
vector with the same length but pointing in the a 
opposite direction. The figure here illustrates ee 
this way of thinking about the additive inverse 
in R2. As you can see, the vector labeled —x has 
the same length as the vector labeled x but points 4 vector and its additive inverse. 
in the opposite direction. 

Having dealt with addition in F”, we now turn to multiplication. We could 
define a multiplication in F” in a similar fashion, starting with two elements of 
F” and getting another element of F” by multiplying corresponding coordinates. 
Experience shows that this definition is not useful for our purposes. Another 
type of multiplication, called scalar multiplication, will be central to our subject. 
Specifically, we need to define what it means to multiply an element of F” by an 
element of F. 


1.18 definition: scalar multiplication in EF" 


The product of a number A and a vector in F” is computed by multiplying 
each coordinate of the vector by A: 


MGA ccegeey) = CASE aan ey 


here A € F and (xj,...,x,,) € F" 


Scalar multiplication is a Mice 8€0- Scalar multiplication in F" multiplies 
metric interpretation in R* If A > 0 and together a scalar and a vector, getting 


x € R’ then Ax is the vector that points g yector, In contrast, the dot product in 
in the same direction as x and whose — R? or R3 multiplies together two vec- 
length is A times the length of x. In other tors and gets a scalar. Generalizations 
words, to get Ax, we shrink or stretch x of the dot product will become impor- 
by a factor of A, depending on whether tant in Chapter 6. 
A<lorA>1. 

If A < Oand x € R% then Ax is the 
vector that points in the direction opposite 
to that of x and whose length is |A| times 
the length of x, as shown here. 1y f 


Scalar multiplication. 
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Digression on Fields 


A field is a set containing at least two distinct elements called 0 and 1, along with 
operations of addition and multiplication satisfying all properties listed in 1.3. 
Thus R and C are fields, as is the set of rational numbers along with the usual 
operations of addition and multiplication. Another example of a field is the set 
{0, 1} with the usual operations of addition and multiplication except that 1 + 1 is 
defined to equal 0. 

In this book we will not deal with fields other than R and C. However, many 
of the definitions, theorems, and proofs in linear algebra that work for the fields 
R and C also work without change for arbitrary fields. If you prefer to do so, 
throughout much of this book (except for Chapters 6 and 7, which deal with inner 
product spaces) you can think of F as denoting an arbitrary field instead of R 
or C. For results (except in the inner product chapters) that have as a hypothesis 
that F is C, you can probably replace that hypothesis with the hypothesis that F 
is an algebraically closed field, which means that every nonconstant polynomial 
with coefficients in F has a zero. A few results, such as Exercise 13 in Section 
1C, require the hypothesis on F that 1+ 1 4 0. 


Exercises 1A 


1 Show thata+ 6 =8+«foralla,pB EC. 


2 Show that (a+ 6)+A=a+(6+A) foralla,B,A EC. 
3. Show that (#6)A = a(BA) for all a, 8, A EC. 
4 Show that A(a + 6) = Aa + AB for all A,a,B EC. 
5 Show that for every a € C, there exists a unique 6 € C such that a + 6 = 0. 
6 Show that for every « € C with a + 0, there exists a unique 6 € C such 
that v6 = 1. 
7 Show that 
-1+ V3i 
2 


is a cube root of 1 (meaning that its cube equals 1). 
8 Find two distinct square roots of i. 
9 Find x € R* such that 
(4, -3,1,7) + 2x = (5,9, —6, 8). 
10 Explain why there does not exist A € C such that 


A(2 — 3i,5 + 41, -6 + 7i) = (12 — 5i, 7 + 22i, -32 — 9i). 
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11 Show that (x+y) +z =x+(y+2Z) forall x,y,z € F” 

12. Show that (ab)x = a(bx) for all x € F” and all a,b € F. 

13. Show that 1x = x for all x € F”. 

14 Show that A(x + y) = Ax + Ay for all A € F and all x,y € F”. 


15 Show that (a + b)x = ax + bx for all a,b € F and all x € F” 


“Can you do addition?” the White Queen asked. ““What’s one and one and one 
and one and one and one and one and one and one and one?” 
“T don’t know,” said Alice. “I lost count.” 


—Through the Looking Glass, Lewis Carroll 


12 Chapter 1 Vector Spaces 


IB_ Definition of Vector Space 


The motivation for the definition of a vector space comes from properties of 
addition and scalar multiplication in F”: Addition is commutative, associative, 
and has an identity. Every element has an additive inverse. Scalar multiplication 
is associative. Scalar multiplication by 1 acts as expected. Addition and scalar 
multiplication are connected by distributive properties. 

We will define a vector space to be a set V with an addition and a scalar 
multiplication on V that satisfy the properties in the paragraph above. 


1.19 definition: addition, scalar multiplication 


e An addition on a set V is a function that assigns an element u + v € V 
to each pair of elements u,v € V. 


e A scalar multiplication on a set V is a function that assigns an element 
Av € V toeach A € F and eachv € V. 


Now we are ready to give the formal definition of a vector space. 


1.20 definition: vector space 


A vector space is a set V along with an addition on V and a scalar multiplication 
on V such that the following properties hold. 
commutativity 
u+v=0+4 forall u,v € V. 
associativity 
(u+v)+w=u+(v+w) and (ab)v = a(bv) for all u,v, w € V and for all 
a,beF. 
additive identity 
There exists an element 0 € V such that v + 0 = v forallv € V. 


additive inverse 
For every v € V, there exists w € V such thatv + w = 0. 


multiplicative identity 
lv =v forallv € V. 


distributive properties 
a(u+v) = au +av and (a+b)v =av + bv for alla,b € F and all u,v € V. 


The following geometric language sometimes aids our intuition. 


1.21 definition: vector, point 


Elements of a vector space are called vectors or points. 
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The scalar multiplication in a vector space depends on F. Thus when we need 
to be precise, we will say that V is a vector space over F instead of saying simply 
that V is a vector space. For example, R” is a vector space over R, and C” is a 
vector space over C. 


1.22 definition: real vector space, complex vector space 


e A vector space over R is called a real vector space. 


e A vector space over C is called a complex vector space. 


Usually the choice of F is either clear from the context or irrelevant. Thus we 
often assume that F is lurking in the background without specifically mentioning it. 
With the usual operations of addition 
and scalar multiplication, F” is a vector 
space over F, as you should verify. The 
example of F” motivated our definition of vector space. 


1.23 example: F° 


F~™ is defined to be the set of all sequences of elements of F: 


The simplest vector space is {0}, which 
contains only one point. 


F° = {(X1,X,...) 1 xX, € F fork = 1,2,...}. 
Addition and scalar multiplication on F~ are defined as expected: 


(Ng hb 5is8 ) + (Y1> Yo, + ) = (x4 + Y4,%X2 + Yas ee )3 
A(X, Xo, sees) = (Ax, Axo, see ). 


With these definitions, F° becomes a vector space over F, as you should verify. 
The additive identity in this vector space is the sequence of all 0’s. 


Our next example of a vector space involves a set of functions. 


1.24 notation: F° 
e IfS is aset, then F° denotes the set of functions from S to F. 
e For f,g & F*, the sum f +g & F® is the function defined by 
(f+ g(x) = f(x) +g) 
for allx € S. 


e For A € F and f € F® the product Af € F is the function defined by 


CC na) 


for allx € S. 
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As an example of the notation above, if S is the interval [0,1] and F = R, then 
R°-14 is the set of real-valued functions on the interval [0, 1]. 
You should verify all three bullet points in the next example. 


1.25 example: F° is a vector space 


e If S is a nonempty set, then F° (with the operations of addition and scalar 
multiplication as defined above) is a vector space over F. 


e The additive identity of F° is the function 0 : S > F defined by 
O(x) =0 


for allx € S. 
e For f € F® the additive inverse of f is the function —f : S > F defined by 


(—f)(x) = —f (x) 
for allx € S. 


The vector space F” is a special case 
of the vector space F° because each 
(X1,+-%,) © F" can be thought of as lists. In general, a vector space is an 
a function x from the set {1, 2, ...,1} to F abstract entity whose elements might 
by writing x(k) instead of x, for the k"" pe lists, functions, or weird objects. 
coordinate of (x1, ...,X,,). In other words, 
we can think of F” as F‘):?:--”. Similarly, we can think of F* as F 

Soon we will see further examples of vector spaces, but first we need to develop 
some of the elementary properties of vector spaces. 

The definition of a vector space requires it to have an additive identity. The 
next result states that this identity is unique. 


The elements of the vector space R "1 
are real-valued functions on [0,1], not 


{1,2,...} 


1.26 unique additive identity 


A vector space has a unique additive identity. 


Proof Suppose 0 and 0’ are both additive identities for some vector space V. 
Then 


0'=0'+0=0+0' =0, 


where the first equality holds because 0 is an additive identity, the second equality 
comes from commutativity, and the third equality holds because 0’ is an additive 
identity. Thus 0’ = 0, proving that V has only one additive identity. 


Each element v in a vector space has an additive inverse, an element w in the 
vector space such that v + w = 0. The next result shows that each element in a 
vector space has only one additive inverse. 
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Proof Suppose V is a vector space. Let v € V. Suppose w and w’ are additive 
inverses of v. Then 


w=wt+O=wt+(vtw)=(wtsytw =04+w =w. 


Thus w = w’, as desired. 


Because additive inverses are unique, the following notation now makes sense. 


Let v,w € V. Then 


e —v denotes the additive inverse of v; 


e w— vis defined to be w + (—v). 


Almost all results in this book involve some vector space. To avoid having to 
restate frequently that V is a vector space, we now make the necessary declaration 
once and for all. 


In the next result, 0 denotes a scalar (the number 0 € F) on the left side of the 
equation and a vector (the additive identity of V) on the right side of the equation. 


Proof Forv € V, we have 


The result in 1.30 involves the additive 
Ov = (0 + 0)v = Ou + Ov. identity of V and scalar multiplication. 

The only part of the definition of a vec- 

Adding the additive inverse of Ov to both =r space that connects vector addition 


sides of the equation above gives 0 = Ov, and scalar multiplication is the dis- 
as desired. tributive property. Thus the distribu- 


tive property must be used in the proof 


In the next result, 0 denotes the addi- of 1.30. 


tive identity of V. Although their proofs 

are similar, 1.30 and 1.31 are not identical. More precisely, 1.30 states that the 
product of the scalar 0 and any vector equals the vector 0, whereas 1.31 states that 
the product of any scalar and the vector 0 equals the vector 0. 
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1.31. anumber times the vector 0 


a0 = 0 for every a € F. 


Proof Fora € F, we have 


aQ = a(0 +0) = a0 +40. 


Adding the additive inverse of a0 to both sides of the equation above gives 0 = a0, 
as desired. 


Now we show that if an element of V is multiplied by the scalar —1, then the 
result is the additive inverse of the element of V. 


1.32 the number —1 times a vector 


(—1)v = —v for every v € V. 


Proof Forv &€ V, we have 


v + (-1)v = lv + (-1)v = (1+ (-1))v = 0v = 0. 


This equation says that (—1)v, when added to v, gives 0. Thus (—1)v is the 
additive inverse of v, as desired. 


Exercises 1B 


1 Prove that —(—v) = v for every v € V. 
2 Suppose a € F, v € V, and av = 0. Prove that a = Oorv = 0. 


3 Suppose v,w € V. Explain why there exists a unique x € V such that 
v+3x = Ww. 


4 The empty set is not a vector space. The empty set fails to satisfy only one 
of the requirements listed in the definition of a vector space (1.20). Which 
one? 


5 Show that in the definition of a vector space (1.20), the additive inverse 
condition can be replaced with the condition that 


Ov = 0 forallv € V. 
Here the 0 on the left side is the number 0, and the 0 on the right side is the 
additive identity of V. 


The phrase a “condition can be replaced” in a definition means that the 
collection of objects satisfying the definition is unchanged if the original 
condition is replaced with the new condition. 
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Let co and —oo denote two distinct objects, neither of which is in R. Define 
an addition and scalar multiplication on R U {co, —0o} as you could guess 
from the notation. Specifically, the sum and product of two real numbers is 
as usual, and for tf € R define 


-o ift<0, oo ift <0, 
too = 40 if t = 0, t(—co) = 40 if t = 0, 
oo if t > 0, —oo ift>0, 


and 


ttw=wtt=wt+w=an, 
i+ (—0oo) = (—0) + t = (—o) + (—&) = -0, 
co + (—co) = (—co) +o = 0. 


With these operations of addition and scalar multiplication, is R U {co, —oo} 
a vector space over R? Explain. 


Suppose S is a nonempty set. Let V° denote the set of functions from S to V. 
Define a natural addition and scalar multiplication on V°, and show that V° 
is a vector space with these definitions. 


Suppose V is a real vector space. 


e The complexification of V, denoted by Vc, equals Vx V. An element of 
Vc is an ordered pair (u,v), where u,v € V, but we write this as u + iv. 


e Addition on V¢ is defined by 
(Uy + 101) + (Uy + 109) = (Uy + Uy) + 1(01 + Vo) 
for all 1,01, Up, 0 € V. 
e Complex scalar multiplication on V¢ is defined by 
(a + bi)(u + iv) = (au — bv) + i(av + bu) 
for all a,b € R and all u,v € V. 


Prove that with the definitions of addition and scalar multiplication as above, 
Vc is a complex vector space. 
Think of V as a subset of Vc by identifying u € V with u+i0. The construc- 


tion of Vc from V can then be thought of as generalizing the construction 
of C” from R” 
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IC Subspaces 


By considering subspaces, we can greatly expand our examples of vector spaces. 


1.33 definition: subspace 


A subset U of V is called a subspace of V if U is also a vector space with the 
same additive identity, addition, and scalar multiplication as on V. 


The next result gives the easiest way gino People ae hee rinology 
to check whether a subset of a vector Jjnear subspace, which means the 


space is a subspace. same as subspace. 


1.34 conditions for a subspace 


A subset U of V is a subspace of V if and only if U satisfies the following 
three conditions. 


additive identity 
0EU. 


closed under addition 
u,w © Uimplies u+w € U. 


closed under scalar multiplication 
a © F and u © U implies au € U. 


Proof If U is a subspace of V, then U The additive identity condition above 
satisfies the three conditions above by the = cod be replaced with the condition 
definition of vector space. that U is nonempty (because then tak- 

Conversely, suppose U satisfies the ing u © U and multiplying it by 0 
three conditions above. The first condi- would imply that 0 € U). However, 
tion ensures that the additive identity of if a subset U of V is indeed a sub- 
V is in U. The second condition ensures space, then usually the quickest way 
that addition makes sense on U. The third —_ to show that U is nonempty is to show 
condition ensures that scalar multiplica- that 0 € U. 
tion makes sense on U. 

If vu € U, then —u [which equals (—1)u by 1.32] is also in U by the third 
condition above. Hence every element of U has an additive inverse in U. 

The other parts of the definition of a vector space, such as associativity and 
commutativity, are automatically satisfied for U because they hold on the larger 
space V. Thus U is a vector space and hence is a subspace of V. 


The three conditions in the result above usually enable us to determine quickly 
whether a given subset of V is a subspace of V. You should verify all assertions 
in the next example. 
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1.35 example: subspaces 


(a) Ifb € F, then 
{(X1, Xp, X3, X4) E Ft : X3 = 5X4 + b} 


is a subspace of F* if and only if b = 0. 


(b) The set of continuous real-valued functions on the interval [0, 1] is a subspace 
of R!0-1 


(c) The set of differentiable real-valued functions on R is a subspace of R®. 


(d) The set of differentiable real-valued functions f on the interval (0,3) such 
that f(2) = b is a subspace of R-* if and only if b = 0. 


(e) The set of all sequences of complex numbers with limit 0 is a subspace of C™. 


Verifying some of the items above 
shows the linear structure underlying 
parts of calculus. For example, (b) above 
requires the result that the sum of two 
continuous functions is continuous. As 
another example, (d) above requires the 
result that for a constant c, the derivative 
of cf equals c times the derivative of f. 


The set {0} is the smallest subspace of 
V, and V itself is the largest subspace 
of V. The empty set is not a subspace 
of V because a subspace must be a 
vector space and hence must contain at 
least one element, namely, an additive 
identity. 


The subspaces of R? are precisely {0}, all lines in R* containing the origin, 
and R2. The subspaces of R° are precisely {0}, all lines in R containing the origin, 
all planes in R? containing the origin, and R° To prove that all these objects are 
indeed subspaces is straightforward—the hard part is to show that they are the 
only subspaces of R* and R*. That task will be easier after we introduce some 


additional tools in the next chapter. 


Sums of Subspaces 


When dealing with vector spaces, we are 
usually interested only in subspaces, as 
opposed to arbitrary subsets. The notion 
of the sum of subspaces will be useful. 


1.36 definition: sum of subspaces 


The union of subspaces is rarely a sub- 
space (see Exercise 12), which is why 
we usually work with sums rather than 
unions. 


Suppose Vj,..., V,, are subspaces of V. The sum of Vj,..., V,,, denoted by 


V, +--+ V,,, is the set of all possible sums of elements of Vj, ..., V,,,. More 
precisely, 


V, Staats Ve = {0 SR OSO 1 OF : O71 S Vitis Oe, S Vas 
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Let’s look at some examples of sums of subspaces. 


1.37 example: a sum of subspaces of F° 


Suppose U is the set of all elements of F? whose second and third coordinates 
equal 0, and W is the set of all elements of F? whose first and third coordinates 
equal 0: 


U = {(x,0,0) €F3:xeEF} and W={(0,y,0) EF? : y & F}. 


Then 
U+W = {(x,y,0) € F : x,y € FH, 


as you should verify. 


1.38 example: a sum of subspaces of F* 


Suppose 
U = {(x,x,y,y) €F*:x,yeF} and W={(x,x,x,y) € F*: x,y © Fh. 


Using words rather than symbols, we could say that U is the set of elements 
of F* whose first two coordinates equal each other and whose third and fourth 
coordinates equal each other. Similarly, W is the set of elements of F* whose first 
three coordinates equal each other. 

To find a description of U + W, consider a typical element (a, a,b, b) of U and 
a typical element (c,c,c,d) of W, where a, b,c,d € F. We have 


(a,a,b,b) + (c,c,c,d) = (a+c,at+c,b+c,b+d), 


which shows that every element of U + W has its first two coordinates equal to 
each other. Thus 


1.39 U+WC {(x, x,y,z) € F*: x,y,z € Fh. 
To prove the inclusion in the other direction, suppose x, y, z € F. Then 
(Xx, X,Y, Z) — (Xx, x, y, y) + (0, 0, 0,z ~ y), 


where the first vector on the right is in U and the second vector on the right is 
in W. Thus (x, x, y,z) € U + W, showing that the inclusion 1.39 also holds in the 
opposite direction. Hence 


U+W = {(x,x,y,z) © F4: x,y,z € F}, 


which shows that U + W is the set of elements of F* whose first two coordinates 
equal each other. 


The next result states that the sum of subspaces is a subspace, and is in fact the 
smallest subspace containing all the summands (which means that every subspace 
containing all the summands also contains the sum). 
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1.40 sum of subspaces is the smallest containing subspace 


Suppose Vj,..., V,, are subspaces of V. Then V, + --- + V,, is the smallest 
subspace of V containing Vj, ..., V,,;. 


Proof The reader can verify that V, + --- + V,,, contains the additive identity 0 
and is closed under addition and scalar multiplication. Thus 1.34 implies that 
V, +--+ V,, is a subspace of V. 

The subspaces Vj, ..., V,, are all con- Sums of subspaces in the theory of vec- 
tained in V, +---+V,,, (to see this, consider jo, spaces are analogous to unions of 
sums Vv, +--+ + U,, where all except one —sybsets in set theory. Given two sub- 
of the v;,’s are 0). Conversely, every sub- spaces of a vector space, the smallest 
space of V containing V,,..., V,,, contains subspace containing them is their sum. 
V, +--+ + V,, (because subspaces must Analogously, given two subsets of a set, 
contain all finite sums of their elements). the smallest subset containing them is 
Thus V, +---+V,,, is the smallest subspace their union. 
of V containing Vj, ..., V,,,. 


Direct Sums 


Suppose Vj, ..., V,,, are subspaces of V. Every element of V, + --- + V,, can be 
written in the form 
Vy te + Uys 


where each v, € V,. Of special interest are cases in which each vector in 
V, +--+ V,,, can be represented in the form above in only one way. This situation 
is so important that it gets a special name (direct sum) and a special symbol (@). 


1.41 definition: direct sum, ® 


Suppose V,,..., V,,, are subspaces of V. 


e The sum V, +--- + V,, is called a direct sum if each element of V, + --- + V,,, 
can be written in only one way as asum v, + --- + U,,, Where each 7%, € V,. 


e If V, +--+ V,, is a direct sum, then V; ® --- ® V,,, denotes V; + --- + V,,, 
with the @ notation serving as an indication that this is a direct sum. 


1.42 example: a direct sum of two subspaces 


Suppose U is the subspace of F° of those vectors whose last coordinate equals 0, 
and W is the subspace of F° of those vectors whose first two coordinates equal 0: 


U = {(x,y,0) EF :x,yEF} and W={(0,0,z) EF? : ze Fh. 


Then F? = U @ W, as you should verify. 
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1.43 example: a direct sum of multiple subspaces 


Suppose V, is the subspace of F” of 
those vectors whose coordinates are all 
0, except possibly in the k" slot; for example, V, = {(0,x,0,...,0) € F" : x € F}. 
Then 


To produce © in TEX, type \oplus. 


F’ = Vi @:-@ Vi. 
as you should verify. 


Sometimes nonexamples add to our understanding as much as examples. 


1.44 example: a sum that is not a direct sum 


Suppose 
V, = {(x, y,0) € F: x,y € F}, 
V, = {(0,0,z) E F :z € FH, 
V3 = {(0,y,y) € Fo: y € Fh. 
Then F? = V, + V, + V3 because every vector (x,y,z) € F® can be written as 
(x, y,Z) = (x, y,0) + (0,0,z) + (0,0, 0), 


where the first vector on the right side is in V,, the second vector is in V>, and the 
third vector is in V3. 

However, F° does not equal the direct sum of V,, Vo, V3, because the vector 
(0,0, 0) can be written in more than one way as a sum v, + V2 + V3, with each 
v, € V,. Specifically, we have 


(0, 0,0) = (0,1,0) + (0,0, 1) + (0, -1, —1) 


and, of course, 
(0, 0,0) = (0,0, 0) + (0, 0,0) + (0,0, 0), 


where the first vector on the right side of each equation above is in V,, the second 
vector is in V5, and the third vector is in V3. Thus the sum V, + V, + V3 is not a 
direct sum. 


The definition of direct sum requires 7), symbol ®, which is a plus sign 
every vector in the sum to have a unique inside a circle, reminds us that we are 
representation as an appropriate sum. dealing with a special type of sum of 
The next result shows that when deciding —_subspaces—each element in the direct 
whether a sum of subspaces is a direct — swmcan be represented in only one way 
sum, we only need to consider whether 0 — as a sum of elements from the specified 
can be uniquely written as an appropriate — subspaces. 
sum. 
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1.45 condition for a direct sum 


Suppose Vj,..., V,, are subspaces of V. Then V, + --- + V,, is a direct sum if 


and only if the only way to write 0 as asum v, + --- + ¥,,, where each v, € V;,, 
is by taking each v, equal to 0. 


Proof First suppose V, + --- + V,,, is a direct sum. Then the definition of direct 
sum implies that the only way to write 0 asa sum v, +---+v,,, where each v, € V;,, 
is by taking each v, equal to 0. 

Now suppose that the only way to write 0 as a sum v, + --- + v,,, where each 
v, € V,, is by taking each v; equal to 0. To show that V, + --- + V,,, is a direct 
sum, let v € V, +--+. + V,,. We can write 


V= Vy te + Vy 


for some v, € Vj,...,U,, € V,,- To show that this representation is unique, 
suppose we also have 
V=Uy te +Uy, 


where u, € V,,...,u,, € V,,. Subtracting these two equations, we have 
0 = (0, — Uy) tHe + (Oy — Uy). 
Because 0, — uy € Vj,...,0_ — Um € V,,, the equation above implies that each 


V, — uz equals 0. Thus v1 = 4,...,0,, = U,,, aS desired. 


The next result gives a simple con- 7p, Winbor bea lused below neans 
dition for testing whether a sum of two “if and only if”: this symbol could also 


subspaces is a direct sum. be read to mean “is equivalent to”. 


1.46 direct sum of two subspaces 


Suppose U and W are subspaces of V. Then 
U+ Wis adirect sum — UNW = {0}. 


Proof First suppose that U+ Wis a direct sum. Ifv € UNW, then 0 = v+ (—9), 
where v € U and —v € W. By the unique representation of 0 as the sum of a 
vector in U and a vector in W, we have v = 0. Thus UN W = {0}, completing 
the proof in one direction. 

To prove the other direction, now suppose UM W = {0}. To prove that U + W 
is a direct sum, suppose u € U, w € W, and 


O=ut+w. 


To complete the proof, we only need to show that u = w = 0 (by 1.45). The 
equation above implies that u = —w € W. Thus u € UMW. Hence u = 0, which 
by the equation above implies that w = 0, completing the proof. 
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The result above deals only with 
the case of two subspaces. When ask- 
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Sums of subspaces are analogous to 
unions of subsets. Similarly, direct 


ing about a possible direct sum with — syms of subspaces are analogous to 
more than two subspaces, it is not — disjoint unions of subsets. No two sub- 
enough to test that each pair of the — spaces ofavector space can be disjoint, 
subspaces intersect only at 0. To see because both contain 0. So disjoint- 
this, consider Example 1.44. In that ness is replaced, at least in the case 
nonexample of a direct sum, we have = of two subspaces, with the requirement 
Vi N Vp = Vi V3 = Vo N Ve = {0}. that the intersection equal {0}. 


Exercises IC 


10 


For each of the following subsets of F°, determine whether it is a subspace 
of F°. 

(a) {(x1,Xo,%3) © F? : x, + 2x. + 3x3 = 0} 

(b) {(%4,%o.%3) GFP t x, + 2x + 3x5 = 4} 

(C) {ete EF? te 5x5 = 0} 

(d). 4 (,%5.55) € Fs x, = Sy} 


Verify all assertions about subspaces in Example 1.35. 


Show that the set of differentiable real-valued functions f on the interval 
(—4,4) such that f(—1) = 3f(2) is a subspace of R~*. 


Suppose b € R. Show that the set of continuous real-valued functions f on 
the interval [0,1] such that So f = bisa subspace of R!-"! if and only if 
b=0. 


Is R? a subspace of the complex vector space C?? 


(a) Is {(a,b,c) € R° : a? = b?} a subspace of R°? 
(b) Is {(a,b,c) € C® : a = b*} a subspace of C>? 


Prove or give a counterexample: If U is a nonempty subset of R? such that 
U is closed under addition and under taking additive inverses (meaning 
—u € U whenever u € U), then U is a subspace of R* 


Give an example of a nonempty subset U of R? such that U is closed under 
scalar multiplication, but U is not a subspace of R%. 


A function f: R > R is called periodic if there exists a positive number p 
such that f(x) = f(x + p) for all x € R. Is the set of periodic functions 
from R to R a subspace of R®? Explain. 


Suppose V, and V, are subspaces of V. Prove that the intersection V, N V5 
is a subspace of V. 


11 


12 


13 


14 
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16 


17 


18 


19 


20 


21 
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Prove that the intersection of every collection of subspaces of V is a subspace 
of V. 


Prove that the union of two subspaces of V is a subspace of V if and only if 
one of the subspaces is contained in the other. 


Prove that the union of three subspaces of V is a subspace of V if and only 
if one of the subspaces contains the other two. 


This exercise is surprisingly harder than Exercise 12, possibly because this 
exercise is not true if we replace F with a field containing only two elements. 


Suppose 
U = {(x,-x,2x) €F?:xeF} and W= {(x,x,2x) CF: xe Fh. 


Describe U + W using symbols, and also give a description of U + W that 
uses no symbols. 


Suppose U is a subspace of V. What is U + U? 


Is the operation of addition on the subspaces of V commutative? In other 
words, if U and W are subspaces of V, is U+ W =W+ U? 


Is the operation of addition on the subspaces of V associative? In other 
words, if V,, V>, V3 are subspaces of V, is 


Does the operation of addition on the subspaces of V have an additive 
identity? Which subspaces have additive inverses? 


Prove or give a counterexample: If V,, V>, U are subspaces of V such that 
Y,+U=V,+U, 
then V, = Vp. 
Suppose 
U = {(x,x,y,y) € F*: x,y € Fh. 
Find a subspace W of F* such that F* = U @ W. 


Suppose 
U = {(x,y,x+y,x —y, 2x) € F: x,y © Fh. 


Find a subspace W of F° such that F> = U @ W. 


Suppose 
U = {(x,y,x+y,x —y, 2x) € F: x,y © Fh. 


Find three subspaces W,, W,, W; of F°, none of which equals {0}, such that 
F =Ue@W, OW, @ NW; . 
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Prove or give a counterexample: If V,, V>, U are subspaces of V such that 
V=V,eU and V=V,eU, 


then Vi = Vo. 
Hint: When trying to discover whether a conjecture in linear algebra is true 
or false, it is often useful to start by experimenting in F*. 


A function f: R > R is called even if 
f(—x) = f(x) 

for all x € R. A function f: R > R is called odd if 
fx) =-f@) 


for all x € R. Let V, denote the set of real-valued even functions on R 
and let V, denote the set of real-valued odd functions on R. Show that 
R® = V.@ V.. 
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Chapter 2 | Sis 


In the last chapter we learned about vector spaces. Linear algebra focuses not 
on arbitrary vector spaces, but on finite-dimensional vector spaces, which we 
introduce in this chapter. 

We begin this chapter by considering linear combinations of lists of vectors. 
This leads us to the crucial concept of linear independence. The linear dependence 
lemma will become one of our most useful tools. 

A list of vectors in a vector space that is small enough to be linearly independent 
and big enough so the linear combinations of the list fill up the vector space is 
called a basis of the vector space. We will see that every basis of a vector space 
has the same length, which will allow us to define the dimension of a vector space. 

This chapter ends with a formula for the dimension of the sum of two subspaces. 


standing assumptions for this chapter 


e F denotes R or C. 
e V denotes a vector space over F. 


The main building of the Institute for Advanced Study, in Princeton, New Jersey. 
Paul Halmos (1916-2006) wrote the first modern linear algebra book in this building. 
Halmos’s linear algebra book was published in 1942 (second edition published in 1958). 
The title of Halmos’s book was the same as the title of this chapter. 
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2A Span and Linear Independence 


We have been writing lists of numbers surrounded by parentheses, and we will 
continue to do so for elements of F”; for example, (2, —7,8) € F°. However, now 
we need to consider lists of vectors (which may be elements of F” or of other 
vector spaces). To avoid confusion, we will usually write lists of vectors without 
surrounding parentheses. For example, (4, 1, 6), (9,5, 7) is a list of length two of 
vectors in R® 


Linear Combinations and Span 


A sum of scalar multiples of the vectors in a list is called a linear combination of 
the list. Here is the formal definition. 


A linear combination of a list vj, ..., 0, of vectors in V is a vector of the form 


a,01 qpcer ae Amn lm> 


where @y,...,4,, € F. 


2.3 example: linear combinations in R® 


e (17, —4,2) is a linear combination of (2,1, —3), (1, 2,4), which is a list of 
length two of vectors in R® because 


(17;=4,2) = 60, 1,3) 4 Stl, 2, a, 


e (17, —4,5) is not a linear combination of (2,1, —3), (1, —2, 4), which is a list 
of length two of vectors in R*, because there do not exist numbers a,,4a, € F 
such that 

(17, —4,5) = a, (2,1, —3) + a (1, —2, 4). 


In other words, the system of equations 


17 = 2a, +a 


has no solutions (as you should verify). 
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The set of all linear combinations of a list of vectors 7j,...,v,, in V is called 
the span of 0,,...,V,,, denoted by span(v,...,v,,). In other words, 


SPAN(Vy, 06, Vy_) = {4 0y Hee FA Vy * Ay, +05 Ay, E Fh. 


The span of the empty list ( ) is defined to be {0}. 


2.5 example: span 


The previous example shows that in F° 
e (17,-4,2) & span((2, 1, -3), (1, —2,4)); 
e (17,-4,5) € span((2, 1, —3), (1, —2,4)). 


Some mathematicians use the term linear span, which means the same as 
span. 


2.6 spanis the smallest containing subspace 


The span of a list of vectors in V is the smallest subspace of V containing all 
vectors in the list. 


Proof Suppose 7, ...,V,, is a list of vectors in V. 
First we show that span(7y, ..., U,,) is a subspace of V. The additive identity is 

in span(v,...,U,,) because 

0 = Ov, + +++ + 0v,,. 
Also, span(?, ...,U,,) is closed under addition because 

(A, 01 Ht Fy Vy) + (Cy Oy Ht + Cy Vy) = (Ay + Cy) He + (Ay + Cy) On 
Furthermore, span(vy, ...,V,,) is closed under scalar multiplication because 
A (101 +t + Ay Oy) = AAO, + + Ady Vy. 


Thus span(v,...,V,,) is a subspace of V (by 1.34). 

Each v, is a linear combination of v,, ...,v,,, (to show this, set a, = 1 and let 
the other a’s in 2.2 equal 0). Thus span(7v, ..., v,,) contains each v,;. Conversely, 
because subspaces are closed under scalar multiplication and addition, every sub- 
space of V that contains each v, contains span(v,,...,V,,,). Thus span(vy, ..., V,,) 
is the smallest subspace of V containing all the vectors v1, ...,0,,. 


If span(v, ...,0,,) equals V, we say that the list v,,...,v,,, spans V. 
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2.8 example: a list that spans F" 


Suppose 1 is a positive integer. We want to show that 
(1,0, ...,0), (0,1,0,...,0),..., (0, ...,0, 1) 


spans F”. Here the k" vector in the list above has 1 in the k" slot and 0 in all other 
slots. 
Suppose (x},...,X,,) © F” Then 


(X41, +05Xy_) = Xz (1,0, ...,0) + x2 (0, 1,0, ...,0) + + + x,,(0,...,0, 1). 
Thus (xj,...,X,) € span((1,0,...,0), (0,1,0,...,0),..., (0,...,0,1)), as desired. 


Now we can make one of the key definitions in linear algebra. 


2.9 definition: finite-dimensional vector space 


A vector space is called finite-dimensional if some list of vectors in it spans 
the space. 


Example 2.8 above shows that F" is a Recall that by definition every list has 
finite-dimensional vector space for every finite length. 


positive integer n. 
The definition of a polynomial is no doubt already familiar to you. 


2.10 definition: polynomial, P(F) 


e A function p: F = F is called a polynomial with coefficients in F if there 
exist do, ...,4,, © F such that 


= 2 m 
P(Z) =p +44Z + Go2Z ++ +4,Z 


for allz € F. 


e P(F) is the set of all polynomials with coefficients in F. 


With the usual operations of addition and scalar multiplication, P(F) is a 
vector space over F, as you should verify. Hence ?(F) is a subspace of FF, the 
vector space of functions from F to F. 

If a polynomial (thought of as a function from F to F) is represented by two 
sets of coefficients, then subtracting one representation of the polynomial from 
the other produces a polynomial that is identically zero as a function on F and 
hence has all zero coefficients (if you are unfamiliar with this fact, just believe 
it for now; we will prove it later—see 4.8). Conclusion: the coefficients of a 
polynomial are uniquely determined by the polynomial. Thus the next definition 
uniquely defines the degree of a polynomial. 


Section 2A Span and Linear Independence 31 


e A polynomial p € P(F) is said to have degree m if there exist scalars 
Ag, 41, ...,4,, € F witha,, # 0 such that for every z € F, we have 


(D2) iy seth ae ote 
e The polynomial that is identically 0 is said to have degree —oo. 
e The degree of a polynomial p is denoted by deg p. 


In the next definition, we use the convention that —co < m, which means that 
the polynomial 0 is in ?,,,(F). 


For m a nonnegative integer, ?,,,(F) denotes the set of all polynomials with 


coefficients in F and degree at most m. 


If m is a nonnegative integer, then ?,,,(F) = span(1,z, ...,z’") [here we slightly 
abuse notation by letting z‘ denote a function]. Thus ?,,,(F) is a finite-dimensional 
vector space for each nonnegative integer m. 


A vector space is called infinite-dimensional if it is not finite-dimensional. 


2.14 example: P(F) is infinite-dimensional. 


Consider any list of elements of P(F). Let m denote the highest degree of the 
polynomials in this list. Then every polynomial in the span of this list has degree 
at most m. Thus z’"*! is not in the span of our list. Hence no list spans P(F). 
Thus PF) is infinite-dimensional. 


Linear Independence 
Suppose 7,...,v,, € Vand v € span(v,...,V,,). By the definition of span, there 
exist a1, ...,4,, © F such that 

UV = AV, $0 FA Vin: 
Consider the question of whether the choice of scalars in the equation above is 
unique. Suppose cj, ...,C,, is another set of scalars such that 


0= Cy, i CmOm: 
Subtracting the last two equations, we have 


0 = (ay ae C1)04 las (Ay, — Cy) Dine 
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Thus we have written 0 as a linear combination of (7, ...,v,,). If the only way 
to do this is by using 0 for all the scalars in the linear combination, then each 
a, — c, equals 0, which means that each a, equals c, (and thus the choice of 
scalars was indeed unique). This situation is so important that we give it a special 
name—linear independence—which we now define. 


2.15 definition: linearly independent 


e A list v,...,v,, of vectors in V is called linearly independent if the only 
choice of a,,...,4,,, © F that makes 


AU, +++ +.4,,0,, = 0 
iS —---— a), — 0: 


e The empty list ( ) is also declared to be linearly independent. 


The reasoning above shows that 7, ..., v,,, is linearly independent if and only if 
each vector in span(?, ...,V,,) has only one representation as a linear combination 
Of 04, ---5 Vin: 


2.16 example: linearly independent lists 


(a) To see that the list (1,0,0,0), (0, 1,0,0), (0,0, 1,0) is linearly independent in 
F4 suppose a1, 4,43 € F and 


a, (1,0, 0,0) + 4>(0,1,0,0) + a3(0,0,1,0) = (0,0,0,0). 


Thus 

(@1, 4, 43,0) = (0, 0, 0, 0). 
Hence a, = a) = a3 = 0. Thus the list (1,0,0, 0), (0,1, 0,0), (0,0, 1,0) is 
linearly independent in F* 


(b 


wm 


Suppose m is a nonnegative integer. To see that the list 1, z, ...,z’” is linearly 
independent in P(F), suppose do, 41, ...,4,, € F and 


fg +ayZ+-+4,,z™ = 0, 
where we think of both sides as elements of P(F). Then 
fy + ayZ+-+4a,,z'" =0 


for all z € F. As discussed earlier (and as follows from 4.8), this implies 
that 47 =a, = +: =a,, = 0. Thus 1,z,...,z” is a linearly independent list in 
P(E). 


(c) A list of length one in a vector space is linearly independent if and only if the 
vector in the list is not 0. 


(d 


wm 


A list of length two in a vector space is linearly independent if and only if 
neither of the two vectors in the list is a scalar multiple of the other. 
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If some vectors are removed from a linearly independent list, the remaining 
list is also linearly independent, as you should verify. 


2.17 definition: linearly dependent 


e A list of vectors in V is called linearly dependent if it is not linearly inde- 
pendent. 


e In other words, a list vj, ...,v,,, of vectors in V is linearly dependent if there 
exist a,,...,4,, © F, not all 0, such that 4,0, +--+ +4,,0,, = 0. 


2.18 example: linearly dependent lists 


e (2,3,1), (1,—1,2), (7,3, 8) is linearly dependent in F° because 
2(2, 3, 1) + 3(1, -1, 2) + (-1)(7, 3, 8) = (0, 0, 0). 
e The list (2,3, 1), (1, -1, 2), (7,3, c) is linearly dependent in F° if and only if 
c = 8, as you should verify. 
e If some vector in a list of vectors in V is a linear combination of the other 
vectors, then the list is linearly dependent. (Proof: After writing one vector in 


the list as equal to a linear combination of the other vectors, move that vector 
to the other side of the equation, where it will be multiplied by —1.) 


e Every list of vectors in V containing the 0 vector is linearly dependent. (This is 
a special case of the previous bullet point.) 


The next lemma is a terrific tool. It states that given a linearly dependent list 
of vectors, one of the vectors is in the span of the previous ones. Furthermore, we 
can throw out that vector without changing the span of the original list. 


2.19 linear dependence lemma 


Suppose vj,...,V,, is a linearly dependent list in V. Then there exists 
k € (1,2, ...,m} such that 


OU, € span(Vj, ...,Up_4)- 


Furthermore, if k satisfies the condition above and the k" term is removed 
from 0}, ..., V,,, then the span of the remaining list equals span(7, ..., V,,). 


Proof Because the list vj,...,v,, is linearly dependent, there exist numbers 
fy, ---4,, © F, not all 0, such that 


M40, ++ +4,,0,, = 0. 
Let k be the largest element of {1, ...,77} such that a, # 0. Then 


which proves that v, € span(vj,...,,_7), as desired. 
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Now suppose k is any element of {1, ...,} such that v, € span(vy, ..., 0¢_1). 
Let b,,...,b,_1 © F be such that 


2.20 OK = bi04 tet by _ 40x _4- 
Suppose u € span(v,...,V,,). Then there exist c;, ...,c,, € F such that 
u= C101 i i CyOm: 


In the equation above, we can replace v, with the right side of 2.20, which shows 
that u is in the span of the list obtained by removing the k" term from 7, ..., Uj: 
Thus removing the k"" term of the list 7}, ..., 7, does not change the span of the 


list. 


If k = 1 in the linear dependence lemma, then 7%, € span(7, ...,0,_4) means 
that v,; = 0, because span( ) = {0}. Note also that parts of the proof of the linear 
dependence lemma need to be modified if k = 1. In general, the proofs in the 
rest of the book will not call attention to special cases that must be considered 
involving lists of length 0, the subspace {0}, or other trivial cases for which the 
result is true but needs a slightly different proof. Be sure to check these special 
cases yourself. 


2.21 example: smallest k in linear dependence lemma 


Consider the list 
(1, 2, 3), (6,5, 4), (15, 16, 17), (8,9, 7) 


in R® This list of length four is linearly dependent, as we will soon see. Thus the 
linear dependence lemma implies that there exists k € {1, 2,3, 4} such that the kth 
vector in this list is a linear combination of the previous vectors in the list. Let’s 
see how to find the smallest value of k that works. 

Taking k = 1 in the linear dependence lemma works if and only if the first 
vector in the list equals 0. Because (1, 2,3) is not the 0 vector, we cannot take 
k = 1 for this list. 

Taking k = 2 in the linear dependence lemma works if and only if the second 
vector in the list is a scalar multiple of the first vector. However, there does not 
exist c € R such that (6,5,4) = c(1, 2,3). Thus we cannot take k = 2 for this list. 

Taking k = 3 in the linear dependence lemma works if and only if the third 
vector in the list is a linear combination of the first two vectors. Thus for the list 
in this example, we want to know whether there exist a,b € R such that 


(15, 16,17) = a(1, 2,3) + b(6,5,4). 


The equation above is equivalent to a system of three linear equations in the two 
unknowns a, b. Using Gaussian elimination or appropriate software, we find that 
a = 3, b = 2 is a solution of the equation above, as you can verify. Thus for the 
list in this example, taking k = 3 is the smallest value of k that works in the linear 
dependence lemma. 
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Now we come to a key result. It says that no linearly independent list in V is 
longer than a spanning list in V. 


2.22 length of linearly independent list < length of spanning list 


In a finite-dimensional vector space, the length of every linearly independent 


list of vectors is less than or equal to the length of every spanning list of 
vectors. 


Proof Suppose that u,,...,u,, is linearly independent in V. Suppose also that 
W ,...,W, Spans V. We need to prove that m < n. We do so through the process 
described below with m steps; note that in each step we add one of the u’s and 
remove one of the w’s. 


Step 1 
Let B be the list w,,...,w,,, which spans V. Adjoining u, at the beginning of 
this list produces a linearly dependent list (because u, can be written as a linear 
combination of w,,..., w,,,). In other words, the list 


Uy, Wy, 0, Wy, 


is linearly dependent. 


Thus by the linear dependence lemma (2.19), one of the vectors in the list above 
is a linear combination of the previous vectors in the list. We know that u, # 0 
because the list 4, ..., u,,, is linearly independent. Thus uw, is not in the span 
of the previous vectors in the list above (because uw, is not in {0}, which is the 
span of the empty list). Hence the linear dependence lemma implies that we 
can remove one of the w’s so that the new list B (of length 1) consisting of u, 
and the remaining w’s spans V. 


Step k, for k = 2,...,m 
The list B (of length 1) from step k—1 spans V. In particular, u, is in the span of 
the list B. Thus the list of length (n + 1) obtained by adjoining u, to B, placing 
it just after v4, ..., 4, _1, is linearly dependent. By the linear dependence lemma 
(2.19), one of the vectors in this list is in the span of the previous ones, and 
because wu, ..., u;, is linearly independent, this vector cannot be one of the u’s. 


Hence there still must be at least one remaining w at this step. We can remove 
from our new list (after adjoining u, in the proper place) a w that is a linear 
combination of the previous vectors in the list, so that the new list B (of length 
n) consisting of u,,..., 4, and the remaining w’s spans V. 


After step m, we have added all the u’s and the process stops. At each step 
as we add a u to B, the linear dependence lemma implies that there is some w to 
remove. Thus there are at least as many w’s as u’s. 
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The next two examples show how the result above can be used to show, without 
any computations, that certain lists are not linearly independent and that certain 
lists do not span a given vector space. 


2.23 example: no list of length 4 is linearly independent in R® 


The list (1, 0,0), (0, 1,0), (0,0, 1), which has length three, spans R°. Thus no 
list of length larger than three is linearly independent in R° 

For example, we now know that (1, 2,3), (4,5, 8), (9, 6, 7), (—3, 2,8), which 
is a list of length four, is not linearly independent in R* 


2.24 example: no list of length 3 spans R* 


The list (1,0,0,0), (0,1, 0,0), (0,0, 1,0), (0,0,0, 1), which has length four, is 
linearly independent in R* Thus no list of length less than four spans R* 

For example, we now know that (1, 2,3, —5), (4,5, 8,3), (9,6,7, -1), which 
is a list of length three, does not span R+ 


Our intuition suggests that every subspace of a finite-dimensional vector space 
should also be finite-dimensional. We now prove that this intuition is correct. 


2.25 _finite-dimensional subspaces 


Every subspace of a finite-dimensional vector space is finite-dimensional. 


Proof Suppose V is finite-dimensional and U is a subspace of V. We need to 
prove that U is finite-dimensional. We do this through the following multistep 
construction. 


Step 1 
If U = {0}, then U is finite-dimensional and we are done. If U # {0}, then 
choose a nonzero vector u, € U. 


Step k 
If U = span(uj,...,u,_7), then U is finite-dimensional and we are done. If 
U # span(uy, ..., U,_1), then choose a vector u, © U such that 


uy E span(uy,...,Up_y)- 


After each step, as long as the process continues, we have constructed a list 
of vectors such that no vector in this list is in the span of the previous vectors. 
Thus after each step we have constructed a linearly independent list, by the linear 
dependence lemma (2.19). This linearly independent list cannot be longer than 
any spanning list of V (by 2.22). Thus the process eventually terminates, which 
means that U is finite-dimensional. 
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Exercises 2A 


1 Find a list of four distinct vectors in F? whose span equals 
{(x%,y,z) EP ixty+z=0}. 
2 Prove or give a counterexample: If v1, v2, v3, v4 spans V, then the list 
U1 — U9, 0p — U3, U3 — U4, 04 

also spans V. 

3 Suppose 7,...,v,, is a list of vectors in V. For k € {1,..., m}, let 
Wy = Vy +o + Op. 
Show that span(7,...,0,,) = span(wy,...,W,_)- 


4 (a) Show that a list of length one in a vector space is linearly independent 
if and only if the vector in the list is not 0. 
(b) Show that a list of length two in a vector space is linearly independent 
if and only if neither of the two vectors in the list is a scalar multiple of 
the other. 


5 Find a number ¢ such that 
(3, 1, 4), (2, 3, 5), (5, 9, t) 
is not linearly independent in R3 


6 Show that the list (2,3,1), (1, -1, 2), (7,3,c) is linearly dependent in F if 
and only if c = 8. 


7 (a) Show that if we think of C as a vector space over R, then the list 
1+ i,1 —jis linearly independent. 
(b) Show that if we think of C as a vector space over C, then the list 
1+i,1 — iis linearly dependent. 


8 Suppose v,, V2, 73, V4 is linearly independent in V. Prove that the list 
O71 — 09, 02 — 03, U3 = 04; 04 
is also linearly independent. 


9 Prove or give a counterexample: If v,, v5, ..., 0, is a linearly independent 
list of vectors in V, then 


501 — 409, U2, 03, +++, Uyy 


is linearly independent. 
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Prove or give a counterexample: If v,, >, ...,V,,, is a linearly independent 
list of vectors in V and A € F with A # 0, then Av1, Av, ..., Av,,, is linearly 
independent. 


Prove or give a counterexample: If v,,...,v,, and w,,...,w,, are linearly 
independent lists of vectors in V, then the list v, + wW,,..., 0, + W,, is linearly 
independent. 


Suppose 7,...,,, is linearly independent in V and w € V. Prove that if 
0, + W,...,U,, + w is linearly dependent, then w € span(7,...,V,,)- 


Suppose 7, ...,,,, is linearly independent in V and w € V. Show that 
V1, +++, U,,, W is linearly independent <=> w € span(v),...,U,,)- 
Suppose 7, ..., 0, is a list of vectors in V. For k € {1,..., m}, let 


Wr = O71 Ss el Ok. 


Show that the list v,...,v,, is linearly independent if and only if the list 
W4,..., W,, is linearly independent. 


Explain why there does not exist a list of six polynomials that is linearly 
independent in P,(F). 


Explain why no list of four polynomials spans P,(F). 


Prove that V is infinite-dimensional if and only if there is a sequence 7}, Vo, ... 
of vectors in V such that v,, ...,v,,, is linearly independent for every positive 
integer m. 


Prove that F~ is infinite-dimensional. 


Prove that the real vector space of all continuous real-valued functions on 
the interval [0, 1] is infinite-dimensional. 


Suppose pg, Py, +++ Py ace polynomials in P,,,(F) such that p,(2) = 0 for each 
k € {0,...,m}. Prove that po, p1,...,P,,, is not linearly independent in P,,,(F). 
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2B Bases 


In the previous section, we discussed linearly independent lists and we also 
discussed spanning lists. Now we bring these concepts together by considering 
lists that have both properties. 


2.26 definition: basis 


A basis of V is a list of vectors in V that is linearly independent and spans V. 


2.27 example: bases 


(a) The list (1,0,...,0), (0,1,0,...,0),..., (0,...,0,1) is a basis of F”, called the 
standard basis of F". 
The list (1,2), (3,5) is a basis of F2 Note that this list has length two, which 


is the same as the length of the standard basis of F?. In the next section, we 
will see that this is not a coincidence. 


(b 


nm 


(c) The list (1,2, —4), (7, —5, 6) is linearly independent in F° but is not a basis 
of F° because it does not span F°. 


(d 


wm 


The list (1,2), (3,5), (4,13) spans F? but is not a basis of F* because it is not 
linearly independent. 


(e) The list (1,1, 0), (0,0, 1) is a basis of {(x,x,y) € F°: x,y € F}. 
(f) The list (1, 1,0), (1,0, —1) is a basis of 


{(x,y,z) EP: xty+z=0}. 
(g) The list 1,z,...,z’” is a basis of P,,,(F), called the standard basis of P,,,(F). 


In addition to the standard basis, F” has many other bases. For example, 
(7,5), (—4, 9) and (1,2), (3, 5) 


are both bases of F* 
The next result helps explain why bases are useful. Recall that “uniquely 
means “in only one way”. 


” 


2.28 criterion for basis 


A list 0,, ..., v,, of vectors in V is a basis of V if and only if every v € V can 
be written uniquely in the form 


2:29 V0 = 4,0, +++ + AyV,, 


where @y,...,4, € F. 
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Proof First suppose that 71,...,0,, is a This proof is essentially a repetition of 


basis of V. Let = V. Because 01,....0; the ideas that led us to the definition of 
spans V, there exist a;,...,4, € F such — jjnear independence. 


that 2.29 holds. To show that the repre- 
sentation in 2.29 is unique, suppose c,,...,c,, are scalars such that we also have 


0= C101 a a Cy Oy. 
Subtracting the last equation from 2.29, we get 
O = (@y —¢€4)0, + + (A, — Cy) Dy. 


This implies that each a, — c, equals 0 (because vj, ...,v,, is linearly independent). 
Hence a; = Cy,...,4, = C,,. We have the desired uniqueness, completing the proof 
in one direction. 

For the other direction, suppose every v € V can be written uniquely in the 
form given by 2.29. This implies that the list v,,...,v,, spans V. To show that 


V1, ++-,U,, is linearly independent, suppose a,, ...,a,, € F are such that 


0 = a0, +++ 4,0,. 


The uniqueness of the representation 2.29 (taking v = 0) now implies that 
a, =++ =a, =0. Thus 7,...,v,, is linearly independent and hence is a basis 
of V. 


A spanning list in a vector space may not be a basis because it is not linearly 
independent. Our next result says that given any spanning list, some (possibly 
none) of the vectors in it can be discarded so that the remaining list is linearly 
independent and still spans the vector space. 

As an example in the vector space F’, if the procedure in the proof below is 
applied to the list (1, 2), (3, 6), (4, 7), (5, 9), then the second and fourth vectors 
will be removed. This leaves (1,2), (4,7), which is a basis of F2 


2.30 every spanning list contains a basis 


Every spanning list in a vector space can be reduced to a basis of the vector 
space. 


Proof Suppose vj, ...,v,, spans V. We want to remove some of the vectors from 
Vj, ++-,U, SO that the remaining vectors form a basis of V. We do this through the 
multistep process described below. 

Start with B equal to the list vj, ..., v,,. 


Step 1 
If v, = 0, then delete v, from B. If v, # 0, then leave B unchanged. 


Step k 
If vy, is in span(vy,...,0,_7), then delete v, from the list B. If v, is not in 
span(v},...,0,_7), then leave B unchanged. 
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Stop the process after step n, getting a list B. This list B spans V because 
our original list spanned V and we have discarded only vectors that were already 
in the span of the previous vectors. The process ensures that no vector in B is 
in the span of the previous ones. Thus B is linearly independent, by the linear 
dependence lemma (2.19). Hence B is a basis of V. 


We now come to an important corollary of the previous result. 


2.31 basis of finite-dimensional vector space 


Every finite-dimensional vector space has a basis. 


Proof By definition, a finite-dimensional vector space has a spanning list. The 
previous result tells us that each spanning list can be reduced to a basis. 


Our next result is in some sense a dual of 2.30, which said that every spanning 
list can be reduced to a basis. Now we show that given any linearly independent list, 
we can adjoin some additional vectors (this includes the possibility of adjoining 
no additional vectors) so that the extended list is still linearly independent but 
also spans the space. 


2.32 every linearly independent list extends to a basis 


Every linearly independent list of vectors in a finite-dimensional vector space 
can be extended to a basis of the vector space. 


Proof Suppose wu), ...,u,,, is linearly independent in a finite-dimensional vector 
space V. Let w,,...,w,, be a list of vectors in V that spans V. Thus the list 


Uy, +0 Uy, Wy, 00 Wy 


spans V. Applying the procedure of the proof of 2.30 to reduce this list to a 
basis of V produces a basis consisting of the vectors 1,..., u,,, and some of the 
w’s (none of the u’s get deleted in this procedure because wu, ..., u,, is linearly 
independent). 


As an example in F*, suppose we start with the linearly independent list 
(2,3, 4), (9, 6, 8). If we take w1, wW2, wW3 to be the standard basis of F® then applying 
the procedure in the proof above produces the list 


(2, 3,4), 9, 6, 8), (0, ip 0), 


which is a basis of F°. 

As an application of the result above, Using the aime icus burimoreled: 
we now show that every subspace of a anced tools, the next result can be 
finite-dimensional vector space can be _ proved without the hypothesis that V is 
paired with another subspace to forma _finite-dimensional. 
direct sum of the whole space. 
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Proof Because V is finite-dimensional, so is U (see 2.25). Thus there is a basis 
Uy, ...,U,, of U (by 2.31). Of course uy,...,u,, is a linearly independent list of 
vectors in V. Hence this list can be extended to a basis 41, ..., U,,,W1, -.., W, of V 
(by 2.32). Let W = span(wy,...,W,)- 

To prove that V = U @ W, by 1.46 we only need to show that 


V=U+W and UNW = {0}. 
To prove the first equation above, suppose v € V. Then, because the list 
Uy, +--,Uj,, Wy, -.-, W, Spans V, there exist a4, ...,4,,,01,...,0,, € F such that 
V0 = Ay + F Ay Uy, +b,W, +--+ b,W,- 
——_—_ a 
u WwW 


We have v = u + w, where u © U and w € W are defined as above. Thus 
v € U+ W, completing the proof that V = U+ W. 

To show that UM W = {0}, suppose v € UN W. Then there exist scalars 
Ay, +++, Ay, 01, ...,b,, © F such that 


V0 = Ay to FA Uy, = 0,W, +++ +5, W,. 


Thus 
AyUy +o + Ay Uy, — 0;W, —-- — bw, = 0. 


Because uy, ..., U,,,W 1, ---, W, is linearly independent, this implies that 


ay sie an by aise b,, 0. 


Thus v = 0, completing the proof that UM W = {0}. 


Exercises 2B 


1 Find all vector spaces that have exactly one basis. 
2 Verify all assertions in Example 2.27. 
3 (a) Let Ube the subspace of R° defined by 
UP 4 (ey ts ecticee) EG RP? ty = Sey and mG = 74 


Find a basis of U. 
(b) Extend the basis in (a) to a basis of R®. 
(c) Find a subspace W of R° such that R° = U@ W. 


10 


11 
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(a) Let U be the subspace of C° defined by 
U = { (24525, 29524525) © C° 2 62, = 2 and 2, +22, +32, = 0}, 


Find a basis of U. 
(b) Extend the basis in (a) to a basis of C® 
(c) Find a subspace W of C? such that C? = U@ W. 


Suppose V is finite-dimensional and U, W are subspaces of V such that 
V = U+W. Prove that there exists a basis of V consisting of vectors in 
UUW. 


Prove or give a counterexample: If po, p1, Po, p3 is a list in P3(F) such that 
none of the polynomials po, p;,P2,p3 has degree 2, then po, p;, P2, P3 is not 
a basis of ?3(F). 


Suppose V1, Vz, V3, V4 is a basis of V. Prove that 
O71 + 09, 02 + 03, 03 + 04; 04 
is also a basis of V. 


Prove or give a counterexample: If v,, v3, v3, v4 is a basis of V and Uis a 
subspace of V such that v,,v, € Uand v3 € U and v, € U, then v7, vp isa 
basis of U. 


Suppose 0, ...,V,, is a list of vectors in V. For k € {1,..., m}, let 
Wy = Oy Sa (an 
Show that 7, ...,v,, is a basis of V if and only if wy, ..., w,, is a basis of V. 


Suppose U and W are subspaces of V such that V = U @ W. Suppose also 
that u4,...,u,,, is a basis of U and wy, ..., w,, is a basis of W. Prove that 


Uy, vvey Uys Wy ory Wy 
is a basis of V. 


Suppose V is a real vector space. Show that if v,,...,v,, is a basis of V (as a 
real vector space), then v1, ..., v,, is also a basis of the complexification Vc 
(as a complex vector space). 


See Exercise 8 in Section 1B for the definition of the complexification Ve. 
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2C Dimension 


Although we have been discussing finite-dimensional vector spaces, we have not 
yet defined the dimension of such an object. How should dimension be defined? 
A reasonable definition should force the dimension of F” to equal n. Notice that 
the standard basis 


(1,0, ...,0), (0,1,0,...,0),..., (0, ...,0, 1) 


of F” has length n. Thus we are tempted to define the dimension as the length of 
a basis. However, a finite-dimensional vector space in general has many different 
bases, and our attempted definition makes sense only if all bases in a given vector 
space have the same length. Fortunately that turns out to be the case, as we now 
show. 


Proof Suppose V is finite-dimensional. Let B, and B, be two bases of V. Then 
B, is linearly independent in V and B, spans V, so the length of B, is at most the 
length of B, (by 2.22). Interchanging the roles of B, and B,, we also see that the 
length of B, is at most the length of B,. Thus the length of B, equals the length 
of By, as desired. 


Now that we know that any two bases of a finite-dimensional vector space 
have the same length, we can formally define the dimension of such spaces. 


2.35 definition: dimension, dim V 


e The dimension of a finite-dimensional vector space is the length of any 
basis of the vector space. 


e The dimension of a finite-dimensional vector space V is denoted by dim V. 


2.36 example: dimensions 


e dim F” = n because the standard basis of F” has length n. 


e dim ?,,(F) = m +1 because the standard basis 1, z, ...,z’” of P,,,(F) has length 
m+1. 

e If U = {(x,x,y) € F®: x,y © F}, then dim U = 2 because (1, 1,0), (0,0, 1) is 
a basis of U. 


e If U = {(x,y,z) € Fo: x + y +z = O}, then dimU = 2 because the list 
(1, -1,0), (1,0, —1) is a basis of U. 
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Every subspace of a finite-dimensional vector space is finite-dimensional 
(by 2.25) and so has a dimension. The next result gives the expected inequality 
about the dimension of a subspace. 


2.37 dimension of a subspace 


If V is finite-dimensional and U is a subspace of V, then dim U < dim V. 


Proof Suppose V is finite-dimensional and U is a subspace of V. Think of a basis 
of U as a linearly independent list in V, and think of a basis of V as a spanning 
list in V. Now use 2.22 to conclude that dim U < dim V. 


To check that a list of vectors in V 
is a basis of V, we must, according to sion two; the complex vector space C 
the definition, show that the list in ques- p46 dimension one. As rere an 
tion satisfies two properties: it must be — p, identified with C (and addition is 
linearly independent and it must span V. the same on both spaces, as is scalar 
The next two results show that if the list mu/tiplication by real numbers). Thus 
in question has the right length, then we — when we talk about the dimension of 
only need to check that it satisfies one a vector space, the role played by the 
of the two required properties. First we — choice of F cannot be neglected. 
prove that every linearly independent list 
of the right length is a basis. 


2.38 linearly independent list of the right length is a basis 


Suppose V is finite-dimensional. Then every linearly independent list of 
vectors in V of length dim V is a basis of V. 


The real vector space R* has dimen- 


Proof Suppose dim V = n and %,...,v,, is linearly independent in V. The list 
V1, -+-, U, can be extended to a basis of V (by 2.32). However, every basis of V has 
length n, so in this case the extension is the trivial one, meaning that no elements 
are adjoined to v,,...,v,,. Thus 7}, ...,v,, is a basis of V, as desired. 


The next result is a useful consequence of the previous result. 


2.39 subspace of full dimension equals the whole space 


Suppose that V is finite-dimensional and U is a subspace of V such that 
dim U = dim V. Then U = V. 


Proof Let w,...,u,, be a basis of U. Thus n = dim U, and by hypothesis we 
also have n = dim V. Thus u),...,u,, is a linearly independent list of vectors in V 
(because it is a basis of U) of length dim V. From 2.38, we see that u,,...,u,, is 
a basis of V. In particular every vector in V is a linear combination of 1/,..., u,,. 
Thus U = V. 
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2.40 example: a basis of F* 


Consider the list (5,7), (4,3) of vectors in F*. This list of length two is linearly 
independent in F* (because neither vector is a scalar multiple of the other). Note 
that F* has dimension two. Thus 2.38 implies that the linearly independent list 
(5,7), (4, 3) of length two is a basis of F? (we do not need to bother checking that 
it spans F’). 


2.41 example: a basis of a subspace of P3(R) 


Let U be the subspace of 73(R) defined by 
U = {p € P3(R) : p(5) = O}. 


To find a basis of U, first note that each of the polynomials 1, (x —5)?, and (x —5)° 
is in U. 
Suppose a, b,c € R and 


a+b(x —5)? + c(x —5)3 =0 


for every x € R. Without explicitly expanding the left side of the equation above, 
we can see that the left side has a cx* term. Because the right side has no x? 
term, this implies that c = 0. Because c = 0, we see that the left side has a bx? 
term, which implies that b = 0. Because b = c = 0, we can also conclude that 
a = 0. Thus the equation above implies that a = b = c = 0. Hence the list 
1, (x — 5)%, (x — 5)° is linearly independent in U. Thus 3 < dim U. Hence 


3 <dimU < dim?3(R) = 4, 


where we have used 2.37. 

The polynomial x is not in U because its derivative is the constant function 1. 
Thus U # 73(R). Hence dim U ¢ 4 (by 2.39). The inequality above now implies 
that dim U = 3. Thus the linearly independent list 1, (x — 5)2, (x — 5)° in U has 
length dim U and hence is a basis of U (by 2.38). 


Now we prove that a spanning list of the right length is a basis. 


2.42 spanning list of the right length is a basis 


Suppose V is finite-dimensional. Then every spanning list of vectors in V of 
length dim V is a basis of V. 


Proof Suppose dimV = n and %,...,v, spans V. The list vj,...,v,, can be 
reduced to a basis of V (by 2.30). However, every basis of V has length n, so in 
this case the reduction is the trivial one, meaning that no elements are deleted 
from 0j,...,0,. Thus vj,...,v,, is a basis of V, as desired. 
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The next result gives a formula for the dimension of the sum of two subspaces 
of a finite-dimensional vector space. This formula is analogous to a familiar 
counting formula: the number of elements in the union of two finite sets equals 
the number of elements in the first set, plus the number of elements in the second 
set, minus the number of elements in the intersection of the two sets. 


2.43 dimension of a sum 


If V, and V, are subspaces of a finite-dimensional vector space, then 


Proof Let vy,...,v,, be a basis of V; NM V2; thus dim(V, MN V,) = m. Because 

V1, +++, Uy, is a basis of V; M V5, it is linearly independent in V,. Hence this list can 

be extended to a basis 7), ...,0,,, Uy, -.., U; of V, (by 2.32). Thus dim V, = m + j. 

Also extend 01, ..., 0, to a basis 04, ..., Vj), W 1, ---» We Of V>; thus dim V, = m +k. 
We will show that 


2.44 Dy vee Opgs Uys veey Ujs Wy o00y We 
is a basis of V; + V>. This will complete the proof, because then we will have 
dim(V, + Vz) =m+jtk 
=(m+j)+(m+k)—-—m 

The list 2.44 is contained in V,; U V2 and thus is contained in V, + Vz. The 
span of this list contains V, and contains V, and hence is equal to V, + V2. Thus 
to show that 2.44 is a basis of V, + V> we only need to show that it is linearly 
independent. 

To prove that 2.44 is linearly independent, suppose 

Ay Vy + + Ay Vyq + DyUy ++ + BU; + CyWy + + + CW, = 0, 

where all the a’s, b’s, and c’s are scalars. We need to prove that all the a’s, b’s, 


and c’s equal 0. The equation above can be rewritten as 


2.45 Cy Wy +o + Cy Wy = AyD — ++ — By Diy — Dy — ++ — Bit, 


which shows that c;w, + +--+ + c,w, € V,. All the w’s are in V3, so this implies 
that c)w, + +++ +c,w, € V; A Vo. Because 7, ...,V,, is a basis of V; MN V>, we have 


CyWy te + Cp Wy = dy, + + dy Diy 


for some scalars d,,...,d,,. But 0, ...,0,,,W1, ---» W, is linearly independent, so 
the last equation implies that all the c’s (and d’s) equal 0. Thus 2.45 becomes the 
equation 

A401 tt FA Vy + OU, to t+ bu, = 0. 
Because the list 01, ..., 0,,, 1, -..,U; is linearly independent, this equation implies 
that all the a’s and b’s are 0, completing the proof. 
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For S a finite set, let #S denote the number of elements of S. The table below 
compares finite sets with finite-dimensional vector spaces, showing the analogy 
between #5 (for sets) and dim V (for vector spaces), as well as the analogy between 
unions of subsets (in the context of sets) and sums of subspaces (in the context of 


vector spaces). 


sets 


vector spaces 


S is a finite set 


V is a finite-dimensional vector space 


#S 


dim V 


for subsets S,,5, of S, the union S; U Sy 
is the smallest subset of S containing S, 
and S, 


for subspaces V,, V> of V, the sum V, + V> 
is the smallest subspace of V containing 
V, and V, 


#(S, U So) 
= #5, + #5, —#(S, N Sp) 


dim(V, + V>) 
= dim V, + dim V, — dim(V, N V>) 


#(S, US>) = #8, + #5, 
ad S; NS» = ) 


dim(V, + V,) = dim V, + dim V, 
= V,N V2 = {0} 


S,;U-+US,, is a disjoint union <= 
#(S;U+-US,,) = #5, + + #5, 


V, +++ + V,, is a direct sum <= 
dim(V, + + + V,,) 
= dim V, + --- + dim V,,, 


The last row above focuses on the analogy between disjoint unions (for sets) 
and direct sums (for vector spaces). The proof of the result in the last box above 


will be given in 3.94. 


You should be able to find results about sets that correspond, via analogy, to 
the results about vector spaces in Exercises 12 through 18. 


Exercises 2C 


1 Show that the subspaces of R? are precisely {0}, all lines in R* containing 


the origin, and R2 


2 Show that the subspaces of R® are precisely {0}, all lines in R° containing 
the origin, all planes in R® containing the origin, and R° 


3. (a) 


Let U = {p € P,(F) : p(6) = 0}. Find a basis of U. 


(b) Extend the basis in (a) to a basis of P,(F). 
(c) Find a subspace W of ?,(F) such that 2,(F) = U @® W. 


4 (a) 


Let U = {p © Py(R) : p"(6) = 0}. Find a basis of U. 


(b) Extend the basis in (a) to a basis of Py (R). 
(c) Find a subspace W of ?,(R) such that P,)(R) = U® W. 


5 (a) 


Let U = {p € Py(F) : p(2) = p(5)}. Find a basis of U. 


(b) Extend the basis in (a) to a basis of Py (F). 
(c) Find a subspace W of ?,(F) such that 2,(F) = U@® W. 


10 


11 


12 


13 


14 


15 


16 


17 
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(a) Let U = {p © Py(F) : p(2) = p(5) = p(6)}. Find a basis of U. 
(b) Extend the basis in (a) to a basis of P,(F). 
(c) Find a subspace W of ?,(F) such that 2,(F) = U @ W. 


(a) Let U = {p € Py(R) : ie p= o}. Find a basis of U. 
(b) Extend the basis in (a) to a basis of P,(R). 
(c) Find a subspace W of ?,(R) such that P,(R) = U® W. 


Suppose 7, ...,V,,, is linearly independent in V and w € V. Prove that 
dim span(v, + W,...,0,, +W) >m—1. 


Suppose m is a positive integer and po, Pj, ---, Py, & P(F) are such that each 
p, has degree k. Prove that po, pj, .--, Pj, is a basis of P,,, (F). 


Suppose m is a positive integer. For 0 < k < m, let 
pines =a, 


Show that pp, ...,P,, is a basis of P,,,(F). 
The basis in this exercise leads to what are called Bernstein polynomials. 
You can do a web search to learn how Bernstein polynomials are used to 
approximate continuous functions on [0,1]. 


Suppose U and W are both four-dimensional subspaces of C®. Prove that 
there exist two vectors in UM W such that neither of these vectors is a scalar 
multiple of the other. 


Suppose that U and W are subspaces of R® such that dim U = 3, dim W = 5, 
and U + W = R® Prove that R° = Ue W. 


Suppose U and W are both five-dimensional subspaces of R?. Prove that 
UN W ¢ {0}. 


Suppose V is a ten-dimensional vector space and V,, V>, V3 are subspaces 
of V with dim V, = dim V, = dim V3 = 7. Prove that V; N V2 N V3 # {0}. 


Suppose V is finite-dimensional and V,, V,, V3 are subspaces of V with 
dim V, + dim V, + dim V3 > 2 dim V. Prove that V; MN Vz N V3 # {0}. 


Suppose V is finite-dimensional and U is a subspace of V with U # V. Let 
n = dim V and m = dimU. Prove that there exist 1 — m subspaces of V, 
each of dimension n — 1, whose intersection equals U. 


Suppose that Vj,..., V,,, are finite-dimensional subspaces of V. Prove that 
V, +--+ V,, is finite-dimensional and 


dim(V, +--+ V,,) < dim V, + ---+ dim V,,. 


The inequality above is an equality if and only if V, + --- + V,, is a direct 
sum, as will be shown in 3.94. 
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18 Suppose V is finite-dimensional, with dim V = n > 1. Prove that there exist 
one-dimensional subspaces V,, ..., V,, of V such that 


V=V,0--@V,. 


19 Explain why you might guess, motivated by analogy with the formula for 
the number of elements in the union of three finite sets, that if V,, V>, V3 are 
subspaces of a finite-dimensional vector space, then 

dim(V, + Vz + V3) 
=dim V, + dim V, + dim V3 
—dim(V,; n Vz) — dim(V, N V3) — dim(V; M V3) 
+ dim(V, N V2 N V3). 


Then either prove the formula above or give a counterexample. 


20 Prove that if V,, V2, and V3 are subspaces of a finite-dimensional vector 
space, then 


dim(V, + V, + V3) 


dim(V, N V2) + dim(V, N V3) + dim(Vz MN V3) 
3 
dim((V,;+V>)NV3) + dim((V,+V3)NV>) + dim((V> + V3) NV; ) 
5 : 


The formula above may seem strange because the right side does not look 
like an integer. 


I at once gave up my former occupations, set down natural history and all its 
progeny as a deformed and abortive creation, and entertained the greatest disdain 
for a would-be science which could never even step within the threshold of real 
knowledge. In this mood I betook myself to the mathematics and the branches of 
study appertaining to that science as being built upon secure foundations, and so 
worthy of my consideration. 


—Frankenstein, Mary Wollstonecraft Shelley 
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Linear Maps 


So far our attention has focused on vector spaces. No one gets excited about 
vector spaces. The interesting part of linear algebra is the subject to which we 
now turn—linear maps. 

We will frequently use the powerful fundamental theorem of linear maps, 
which states that the dimension of the domain of a linear map equals the dimension 
of the subspace that gets sent to 0 plus the dimension of the range. This will imply 
the striking result that a linear map from a finite-dimensional vector space to itself 
is one-to-one if and only if its range is the whole space. 

A major concept that we will introduce in this chapter is the matrix associated 
with a linear map and with a basis of the domain space and a basis of the target 
space. This correspondence between linear maps and matrices provides much 
insight into key aspects of linear algebra. 

This chapter concludes by introducing product, quotient, and dual spaces. 

In this chapter we will need additional vector spaces, which we call U and W, 
in addition to V. Thus our standing assumptions are now as follows. 


e F denotes R or C. 
e U, V, and W denote vector spaces over F. 


VS-Ad 9D JAFEYyOS UBJBI1S 


The twelfth-century Dankwarderode Castle in Brunswick (Braunschweig), where Carl 
Friedrich Gauss (1777-1855) was born and grew up. In 1809 Gauss published a method 
for solving systems of linear equations. This method, now called Gaussian elimination, 

was used in a Chinese book written over 1600 years earlier. 
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3A Vector Space of Linear Maps 


Definition and Examples of Linear Maps 


Now we are ready for one of the key definitions in linear algebra. 


3.1 definition: linear map 


A linear map from V to W is a function T: V — W with the following 
properties. 


additivity 
T(ut+v) =Tu+Tvforallu,v € V. 


homogeneity 
T(Av) = A(Tv) for all A € F and all v € V. 


Note that for linear maps we often Some mathematicians use the phrase 
use the notation Tv as well as the usual Jjnear transformation, which means 
function notation T(v). the same as linear map. 

3.2 notation: £(V,W), L(V) 


e The set of linear maps from V to W is denoted by £(V, W). 


e The set of linear maps from V to V is denoted by £(V). In other words, 
AV POV Vale 


Let’s look at some examples of linear maps. Make sure you verify that each 
of the functions defined in the next example is indeed a linear map: 


3.3 example: linear maps 


zero 

In addition to its other uses, we let the symbol 0 denote the linear map that takes 
every element of some vector space to the additive identity of another (or possibly 
the same) vector space. To be specific, 0 € L(V, W) is defined by 


Ov = 0. 


The 0 on the left side of the equation above is a function from V to W, whereas 
the 0 on the right side is the additive identity in W. As usual, the context should 
allow you to distinguish between the many uses of the symbol 0. 


identity operator 
The identity operator, denoted by I, is the linear map on some vector space that 
takes each element to itself. To be specific, I € £(V) is defined by 


Iv=v. 
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differentiation 
Define D € £(P(R)) by 

Dp = p’. 
The assertion that this function is a linear map is another way of stating a basic 
result about differentiation: (f+¢)' = f’ +’ and (Af)’ = Af’ whenever f, ¢ are 
differentiable and A is a constant. 


integration 
Define T € £(P(R), R) by 
1 
Tp = I, p. 
The assertion that this function is linear is another way of stating a basic result 
about integration: the integral of the sum of two functions equals the sum of the 


integrals, and the integral of a constant times a function equals the constant times 
the integral of the function. 


multiplication by x* 
Define a linear map T € £(P(R)) by 


(Tp) (x) = x?p(x) 
foreachx E€ R. 


backward shift 
Recall that F* denotes the vector space of all sequences of elements of F. Define 
a linear map T € £(F™) by 


T(X}, Xo, X3, aoe ) = (Xp, X3, aoe is 
from R° to R? 
Define a linear map T € £(R°, R*) by 
T(x, y,Z) = (2x —y + 3z,7x + Sy — 62). 
from F” to F” 
To generalize the previous example, let m and n be positive integers, let A; , € F 


for each j = 1,...,m and eachk = 1,...,n, and define a linear map T € £(F", F”) 
by 


T(X4, sign) = (Aq 4%4 feces cae Ain Xn ory Am1%1 ap seam Ayn Xn): 
Actually every linear map from F” to F” is of this form. 


composition 
Fix a polynomial q € P(R). Define a linear map T € £(P(R)) by 


(Tp)(x) = p(q(x)). 


The existence part of the next result means that we can find a linear map that 
takes on whatever values we wish on the vectors in a basis. The uniqueness part 
of the next result means that a linear map is completely determined by its values 
on a basis. 


54 Chapter 3 Linear Maps 


3.4 linear map lemma 


Suppose 7,...,V, is a basis of V and wy,,...,w, © W. Then there exists a 
unique linear map T: V — W such that 


TV, = Wy 


for each k = 1,...,n. 


Proof First we show the existence of a linear map T with the desired property. 
Define T: V — W by 


T (C101 +0 + C,0,) = CW, ++ +, Wy; 


where c,,...,c,, are arbitrary elements of F. The list v,,...,v,, is a basis of V. Thus 
the equation above does indeed define a function T from V to W (because each 
element of V can be uniquely written in the form c,v, + --- + c,,0,). 

For each k, taking c, = 1 and the other c’s equal to 0 in the equation above 
shows that Tv, = wr. 

Ifu,v € V with u = a,v, +--+ +.4,0, and v = cv, + +++ +c,U,, then 


T(u+v) = T((a, +0€,)0, +++ + (4, +C,)0,) 
= (4, +C1)W, ++ + (A, +C,)W, 
= (AW, ++ + 4,W,,) + (CyWy + + C,W,) 
=Tu+Tov. 
Similarly, if A € F and v = cyv, + +++ +¢,,0,, then 
T(Av) = T(Acy 01 + ++ + Ac,2,) 
= Acyw, ++ + Ac,w, 
= A(cyw, ++ +C,W,) 
= ATov. 


Thus T is a linear map from V to W. 

To prove uniqueness, now suppose that T € Z(V, W) and that Tv, = w, for 
each k = 1,...,n. Let cy,...,c, © F. Then the homogeneity of T implies that 
T(c,v,) = c,w, for each k = 1,...,n. The additivity of T now implies that 


T (C10, +0 + Cy0,) = CW, Ho +O, Wy. 


Thus T is uniquely determined on span(v), ...,v,,) by the equation above. Because 
V1, +.-5U, is a basis of V, this implies that T is uniquely determined on V, as 
desired. 
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Algebraic Operations on £(V, W) 
We begin by defining addition and scalar multiplication on 2(V, W). 


3.5 definition: addition and scalar multiplication on £(V, W) 


Suppose S,T € £(V,W) and A € F. The sum S + T and the product AT are 
the linear maps from V to W defined by 


(S+T)(v) =Sv+Tv and (AT)(v) = A(Tv) 


for allv € V. 


You should verify that S + T and AT 
as defined above are indeed linear maps. jrgthematics. However. they are not as 
In other words, if S,T € £(V,W) and ubiquitous as imagined by people who 
Ae F,thenS+T e€ £(V,W) and AT € seem to think cos is a linear map from 
£(V,W). R to R when they incorrectly write that 

Because we took the trouble to de- — cos(x+y) equals cos x+cos y and that 
fine addition and scalar multiplication on — cos 2x equals 2 cos x. 

L(V,W), the next result should not be a 
surprise. 


Linear maps are pervasive throughout 


3.6 L£(V,W) is a vector space 


With the operations of addition and scalar multiplication as defined above, 
£(V, W) is a vector space. 


The routine proof of the result above is left to the reader. Note that the additive 
identity of £(V, W) is the zero linear map defined in Example 3.3. 

Usually it makes no sense to multiply together two elements of a vector space, 
but for some pairs of linear maps a useful product exists, as in the next definition. 


3.7. definition: product of linear maps 


IfT € £(U, V) andS € L(V, W), then the product ST © £(U, W) is defined 


by 


(ST)(4) = S(Tu) 
for all u € U. 


Thus ST is just the usual composition S o T of two functions, but when both 
functions are linear, we usually write ST instead of S o T. The product notation 
ST helps make the distributive properties (see next result) seem natural. 

Note that ST is defined only when T maps into the domain of S. You should 
verify that ST is indeed a linear map from U to W whenever T € £(U, V) and 
SEL(V,W). 
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3.8 algebraic properties of products of linear maps 


associativity 
(T,T,)T3 = T,(T,T3) whenever T,, T,, and T3 are linear maps such that 
the products make sense (meaning T3; maps into the domain of T,, and T, 
maps into the domain of T;). 


identity 
TI = IT = T whenever T € £(V, W); here the first I is the identity operator 
on V, and the second J is the identity operator on W. 


distributive properties 
(Sy + S,)T = S;T +S,T and S(T, + T,) = ST, + ST, whenever 
T,T,,T, € £(U, V) and S,S,,55 € L(V,W). 


The routine proof of the result above is left to the reader. 
Multiplication of linear maps is not commutative. In other words, it is not 
necessarily true that ST = TS, even if both sides of the equation make sense. 


3.9 example: two noncommuting linear maps from P(R) to P(R) 


Suppose D € £(P(R)) is the differentiation map defined in Example 3.3 
and T € £(P(R)) is the multiplication by x? map defined earlier in this section. 
Then 


((TD)p) (x) = x*p(x) but ((DT)p)(x) = x2p(x) + 2xp(x). 
Thus TD # DT—differentiating and then multiplying by x? is not the same as 


multiplying by x? and then differentiating. 


3.10 linear maps take 0 to 0 


Suppose T is a linear map from V to W. Then T(0) = 0. 


Proof By additivity, we have 
T(O) = T(0+ 0) = T(O) + T(O). 


Add the additive inverse of T(0) to each side of the equation above to conclude 
that T(0) = 0. 


Suppose m,b € R. The function f: R — R defined by 
f(x) =mx +b 


is a linear map if and only if b = 0 (use 3.10). Thus the linear functions of high 
school algebra are not the same as linear maps in the context of linear algebra. 
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Exercises 3A 


1 Suppose b,c € R. Define T: R? > R? by 
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T(x, y,Z) = (2x — 4y + 3z + b, 6x + cxyz). 
Show that T is linear if and only if b = c = 0. 
Suppose b,c € R. Define T: P(R) — R? by 


2 
Tp = (3p(4) + 5p'(6) + bp(1)p(2), [, x3p(x) dx +csin p(0)). 
Show that T is linear if and only if b = c = 0. 


Suppose that T © £(F",F”). Show that there exist scalars A, , © F for 
j=i,...,mandk = 1,...,n such that 


TMi sxe Xi) = (Aq 1% ap ae ope Ain Xypy eves Ann 1X1 ap see A Xn) 


m,n 
for every (X1,...,X,) € F” 
This exercise shows that the linear map T has the form promised in the 


second to last item of Example 3.3. 


Suppose T € L£(V,W) and v%,...,v,, is a list of vectors in V such that 
T0,,..-,TV,, is a linearly independent list in W. Prove that v,...,v,,, is 
linearly independent. 


Prove that £(V, W) is a vector space, as was asserted in 3.6. 


Prove that multiplication of linear maps has the associative, identity, and 
distributive properties asserted in 3.8. 


Show that every linear map from a one-dimensional vector space to itself is 
multiplication by some scalar. More precisely, prove that if dim V = 1 and 
T € L(V), then there exists A € F such that Tv = Av for all v € V. 


Give an example of a function g: R? — R such that 
plav) = ap(v) 


for alla € R and all v € R? but @ is not linear. 
This exercise and the next exercise show that neither homogeneity nor 
additivity alone is enough to imply that a function is a linear map. 
Give an example of a function g: C > C such that 
PCW + Z) = Pw) + P(Z) 
for all w, z € C but ¢ is not linear. (Here C is thought of as a complex vector 
space.) 


There also exists a function g: R > R such that 9 satisfies the additivity 
condition above but ¢ is not linear. However, showing the existence of such 
a function involves considerably more advanced tools. 
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10 Prove or give a counterexample: If g € P(R) and T: P(R) > P(R) is 
defined by Tp = q op, then T is a linear map. 


The function T defined here differs from the function T defined in the last 
bullet point of 3.3 by the order of the functions in the compositions. 


11 Suppose V is finite-dimensional and T € L(V). Prove that T is a scalar 
multiple of the identity if and only if ST = TS for every S € L(V). 


12 Suppose U is a subspace of V with U # V. Suppose S € £(U, W) and 
S # 0 (which means that Su + 0 for some u € U). Define T: V — W by 


T Sv ifveUu, 
— 
0 ifveVandv€éU. 


Prove that T is not a linear map on V. 


13 Suppose V is finite-dimensional. Prove that every linear map on a subspace 
of V can be extended to a linear map on V. In other words, show that if U is 
a subspace of V and S € £(U, W), then there exists T € £(V, W) such that 
Tu = Su for all u € U. 


The result in this exercise is used in the proof of 3.125. 


14 Suppose V is finite-dimensional with dim V > 0, and suppose W is infinite- 
dimensional. Prove that £(V, W) is infinite-dimensional. 


15 Suppose 7v,...,v,, is a linearly dependent list of vectors in V. Suppose 
also that W # {0}. Prove that there exist w,,...,w,, © W such that no 
T € L(V, W) satisfies Tv, = w, for each k = 1,...,m. 


16 Suppose V is finite-dimensional with dim V > 1. Prove that there exist 
S,T € L(V) such that ST # TS. 


17 Suppose V is finite-dimensional. Show that the only two-sided ideals of 
£(V) are {0} and Z(V). 


A subspace E of £(V) is called a two-sided ideal of L(V) if TE € E and 
ET € € forallE € EandallT € £(V). 
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3B Null Spaces and Ranges 


Null Space and Injectivity 


In this section we will learn about two subspaces that are intimately connected 
with each linear map. We begin with the set of vectors that get mapped to 0. 


3.11 definition: null space, null T 


For T € £(V, W), the null space of T, denoted by null T, is the subset of V 


consisting of those vectors that T maps to 0: 


null T = {0 € V: Tv = O}. 


3.12 example: null space 


e If T is the zero map from V to W, meaning that Tv = 0 for every v € V, then 
nullT = V. 


e Suppose g € L(C°,C) is defined by —(z,,22,23) = 21 + 22) + 325. Then 
null g equals {(Z1,25,23) € C? : z; + 2z) + 3z3 = 0}, which is a subspace of 
the domain of g. We will soon see that the null space of each linear map is a 
subspace of its domain. 


e Suppose D € L(P(R)) is the dif- 


The word “null” means zero. Thus the 


ferentiation map defined by Dp = p’. 
The only functions whose derivative 
equals the zero function are the con- 
stant functions. Thus the null space of 


term “null space” should remind you 
of the connection to 0. Some mathe- 
maticians use the term kernel instead 
of null space. 


D equals the set of constant functions. 


e Suppose that T € L(P(R)) is the multiplication by x? map defined by 
(Tp) (x) = x?p(x). The only polynomial p such that xp (x) = OforallxER 
is the 0 polynomial. Thus null T = {0}. 


Suppose T € £(F®) is the backward shift defined by 


T(X1,X2,X3, aoe ) = (Xp, X3, aoe ). 


Then T(x,,%,X3,...) equals 0 if and only if the numbers x,,x3,... are all 0. 


Thus null T = {(a,0,0,...) :a © F}. 


The next result shows that the null space of each linear map is a subspace of 
the domain. In particular, 0 is in the null space of every linear map. 
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Proof Because T is a linear map, T(0) = 0 (by 3.10). Thus 0 € null T. 
Suppose u,v € null T. Then 


Tautv) =Tu+Tv=0+0=0. 


Hence u + v € null T. Thus null T is closed under addition. 
Suppose u € null T and A € F. Then 


T(Au) = ATu = AD =0. 


Hence Au € null T. Thus null T is closed under scalar multiplication. 
We have shown that null T contains 0 and is closed under addition and scalar 
multiplication. Thus null T is a subspace of V (by 1.34). 


As we will soon see, for a linear map the next definition is closely connected 
to the null space. 


3.14 definition: injective 


A function T: V > Wis called injective if Tu = Tv implies u = v. 


We could rephrase the definition 
above to say that T is injective if u # v 
implies that Tu # Tv. Thus T is injective 
if and only if it maps distinct inputs to distinct outputs. 

The next result says that we can check whether a linear map is injective 
by checking whether 0 is the only vector that gets mapped to 0. As a simple 
application of this result, we see that of the linear maps whose null spaces we 
computed in 3.12, only multiplication by x? is injective (except that the zero map 
is injective in the special case V = {0}). 


The term one-to-one means the same 
as injective. 


Proof First suppose T is injective. We want to prove that nullT = {0}. We 
already know that {0} C nullT (by 3.10). To prove the inclusion in the other 
direction, suppose v € null T. Then 


T(v) =0=T(0). 


Because T is injective, the equation above implies that v = 0. Thus we can 
conclude that null T = {0}, as desired. 

To prove the implication in the other direction, now suppose null T = {0}. We 
want to prove that T is injective. To do this, suppose u,v € V and Tu = Tv. Then 


0=Tu-Tv=T(u-vd). 


Thus uw — v is in null T, which equals {0}. Hence u — v = 0, which implies that 
u = v. Hence T is injective, as desired. 
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Range and Surjectivity 


Now we give a name to the set of outputs of a linear map. 


3.16 definition: range 


For T € £(V, W), the range of T is the subset of W consisting of those vectors 
that are equal to Tv for some v € V: 


range T = {Tv: ve€ V}. 


3.17 example: range 


e If T is the zero map from V to W, meaning that Tv = 0 for every v € V, then 
range T = {0}. 


e Suppose T € £(R2,R®) is defined by T(x, y) = (2x, 5y,x + y). Then 
range T = {(2x,5y,x + y) : x,y € R}. 


Note that range T is a subspace of R°. We will soon see that the range of each 
element of £(V, W) is a subspace of W. 


Suppose D € £(P(R)) is the differentiation map defined by Dp = p’. Because 
for every polynomial q € ?(R) there exists a polynomial p € P(R) such that 
p’ = q, the range of D is P(R). 


The next result shows that the range of each linear map is a subspace of the 
vector space into which it is being mapped. 


Proof Suppose T € £(V,W). Then T(0) = 0 (by 3.10), which implies that 
0 € range T. 
If w,,w, € range T, then there exist v},v. € V such that Tv; = w, and 
TU = Wp. Thus 
T(0, +02) = Tv, + Ty = W, 4+ Wo. 


Hence w, + wy € range T. Thus range T is closed under addition. 
If w € range T and A € F, then there exists v € V such that Tv = w. Thus 
T(Av) = ATv = Aw. 


Hence Aw € range T. Thus range T is closed under scalar multiplication. 
We have shown that range T contains 0 and is closed under addition and scalar 
multiplication. Thus range T is a subspace of W (by 1.34). 


62 Chapter 3 Linear Maps 


3.19 definition: surjective 


A function T: V > Wis called surjective if its range equals W. 


To illustrate the definition above, note that of the ranges we computed in 3.17, 
only the differentiation map is surjective (except that the zero map is surjective in 
the special case W = {0}). 

Whether a linear map is surjective de- 
pends on what we are thinking of as the 
vector space into which it maps. 


3.20 example: surjectivity depends on the target space 


The differentiation map D € £(P7,(R)) defined by Dp = p’ is not surjective, 
because the polynomial x° is not in the range of D. However, the differentiation 
map S € £(P5(R), P4(R)) defined by Sp = p’ is surjective, because its range 
equals ?,(R), which is the vector space into which S maps. 


Some people use the term onto, which 
means the same as surjective. 


Fundamental Theorem of Linear Maps 


The next result is so important that it gets a dramatic name. 


3.21 fundamental theorem of linear maps 


Suppose V is finite-dimensional and T € £(V,W). Then range T is finite- 
dimensional and 


dim V = dimnull T + dim range T. 


Proof Let w,...,u,, be a basis of nullT; thus dimnullT = m. The linearly 
independent list 1, ..., u,,, can be extended to a basis 


Uy) vey Uggs Oy) oy V 


of V (by 2.32). Thus dim V = m+n. To complete the proof, we need to show that 
range T is finite-dimensional and dim range T = n. We will do this by proving 
that Tv,,..., Tv,, is a basis of range T. 

Let v € V. Because uy,..., Uj), 01, +, 0, Spans V, we can write 


V0 = Ayu, + +. 4,,U,, +010, +++ +0,0,, 
where the a’s and b’s are in F. Applying T to both sides of this equation, we get 
Tv = b,Tv, +--+ + 0,To,, 


where the terms of the form Tu, disappeared because each u, is in null T. The 
last equation implies that the list Tv,, ..., Tv,, spans range T. In particular, range T 
is finite-dimensional. 


Section 3B Null Spaces and Ranges 63 


To show T7,..., Tv,, is linearly independent, suppose c,...,c,, € F and 
cyT0, ++ +c,Tv, = 0. 


Then 


T(cy0, +++ +¢,0,) = 0. 


Hence 
CU, +++ +¢,0, € nullT. 


Because u,..., u,, spans null T, we can write 
C0, $e +C,0, = dyuy +o +d, Uy, 


where the d’s are in F. This equation implies that all the c’s (and d’s) are 0 
(because U4, ..., Uj), V1, ---, V,, is linearly independent). Thus Tv), ..., Tv,, is linearly 
independent and hence is a basis of range T, as desired. 


Now we can show that no linear map from a finite-dimensional vector space 
to a “smaller” vector space can be injective, where “smaller” is measured by 
dimension. 


3.22 linear map to a lower-dimensional space is not injective 


Suppose V and W are finite-dimensional vector spaces such that 
dim V > dim W. Then no linear map from V to W is injective. 


Proof LetT € Z(V,W). Then 
dim null T = dim V — dimrange T 
> dim V — dim W 
> 0, 


where the first line above comes from the fundamental theorem of linear maps 
(3.21) and the second line follows from 2.37. The inequality above states that 
dim null T > 0. This means that null T contains vectors other than 0. Thus T is 
not injective (by 3.15). 


3.23 example: linear map from F* to F° is not injective 


Define a linear map T: F* > F° by 
T (21,29, 23,24) = (V7z, + 7Zq + 24,9721 + 3Zq + 223, Z + 623 + 7Z4). 


Because dim F* > dim F* we can use 3.22 to assert that T is not injective, without 
doing any calculations. 
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The next result shows that no linear map from a finite-dimensional vector 
space to a “bigger” vector space can be surjective, where “bigger” is measured by 
dimension. 


3.24 linear map to a higher-dimensional space is not surjective 


Suppose V and W are finite-dimensional vector spaces such that 
dim V < dim W. Then no linear map from V to W is surjective. 


Proof LetT € L(V,W). Then 
dim range T = dim V — dim null T 


< dim V 
< dim W, 


where the equality above comes from the fundamental theorem of linear maps 
(3.21). The inequality above states that dimrange T < dim W. This means that 
range T cannot equal W. Thus T is not surjective. 


As we will soon see, 3.22 and 3.24 have important consequences in the theory 
of linear equations. The idea is to express questions about systems of linear 
equations in terms of linear maps. Let’s begin by rephrasing in terms of linear 
maps the question of whether a homogeneous system of linear equations has a 
nonzero solution. 


Fix positive integers m and n, and let Homogeneous, in this context, means 


Aj, € F forj =1,...,.mandk =1,...,1. that the constant term on the right side 
Consider the homogeneous system of lin- of gach equation below is 0. 


ear equations 


nN 
>. Ay KX = 0 
k=1 


n 
> Aas XK = 0. 
k=1 
Clearly x; = --- = x, = Ois a solution of the system of equations above; the 


question here is whether any other solutions exist. 
Define T: F” > F” by 


n n 
3.25 T (Xj 5.0005 Xp) = ( om Ay KX bo y Ank%e). 
k=1 k=1 


The equation T(x}, ...,x,,) = 0 (the 0 here is the additive identity in F”, namely, 
the list of length m of all 0’s) is the same as the homogeneous system of linear 
equations above. 

Thus we want to know if null T is strictly bigger than {0}, which is equivalent 
to T not being injective (by 3.15). The next result gives an important condition 
for ensuring that T is not injective. 
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3.26 homogeneous system of linear equations 


A homogeneous system of linear equations with more variables than equations 
has nonzero solutions. 


Proof Use the notation and result from the discussion above. Thus T is a linear 
map from F” to F”, and we have a homogeneous system of m linear equations 
with n variables xj, ..., x,,. From 3.22 we see that T is not injective ifn > m. 


Example of the result above: a homogeneous system of four linear equations 
with five variables has nonzero solutions. 

Now we consider the question of Inhomogeneous, as used in this con- 
whether an inhomogeneous system of lin- text, means that the constant term on 
ear equations has no solutions for some pe right side of at least one equation 
choice of the constant terms. To rephrase —_ below does not equal 0. 
this question in terms of a linear map, fix 
positive integers m and n, and let A, , € F for all j = 1,...,m and all k = 1,...,n. 


j 
For cj, ...,C,, © F, consider the system of linear equations 


a Ay Xe = Cy 
k= 
3.27 
n 
> Am kXk = Cin 
k=1 


The question here is whether there is some choice of c,,...,c,,, € F such that no 
solution exists to the system above. 


Define T: F" > F” as in 3.25. The The results 3.26 and 3.28, which com- 


equation T (x1, .+-5% 1) = (Cy, +++ Cm) is the pare the number of variables and 


same as the system of equations 3.27. the number of equations, can also 
Thus we want to know if rangeT # F™ be proved using Gaussian elimina- 
Hence we can rephrase our question tion. The abstract approach taken here 
about not having a solution for some _ seems to provide cleaner proofs. 
choice of cy, ...,C,, € F as follows: What 

condition ensures that T is not surjective? The next result gives one such condition. 


3.28 inhomogeneous system of linear equations 


An inhomogeneous system of linear equations with more equations than 
variables has no solution for some choice of the constant terms. 


Proof Use the notation and result from the example above. Thus T is a linear 
map from F” to F”, and we have a system of m equations with n variables x1, ..., X,.. 
From 3.24 we see that T is not surjective ifn < m. 


Example of the result above: an inhomogeneous system of five linear equations 
with four variables has no solution for some choice of the constant terms. 
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Exercises 3B 


Ny A wn & 


10 


11 


12 


13 


14 


15 


Give an example of a linear map T with dim null T = 3 and dim range T = 2. 
Suppose S,T € L(V) are such that range S C null T. Prove that (ST)? = 0. 
Suppose vj, ...,V,,, is a list of vectors in V. Define T € £(F”, V) by 

T (24, 0065 Ziq) = Zz Vy Ho + Zq_V 


m~m* 


(a) What property of T corresponds to 74, ...,V,, spanning V? 
(b) What property of T corresponds to the list 7,,...,v,, being linearly 
independent? 


Show that {T € Z(R°, R*) : dim null T > 2} is not a subspace of Z(R°, R*). 
Give an example of T € £(R*) such that range T = null T. 
Prove that there does not exist T € Z(R°) such that range T = null T. 


Suppose V and W are finite-dimensional with 2 < dim V < dim W. Show 
that {T € L(V, W) : T is not injective} is not a subspace of L(V, W). 


Suppose V and W are finite-dimensional with dim V > dim W > 2. Show 
that {T € L(V, W) : T is not surjective} is not a subspace of L(V, W). 


Suppose T € £(V, W) is injective and 7, ..., v,, is linearly independent in V. 
Prove that Tv,,...,Tv,, is linearly independent in W. 


Suppose vj,...,v,, spans V and T € L(V, W). Show that Tv, ..., Tv, spans 
range T. 


Suppose that V is finite-dimensional and that T € £(V, W). Prove that there 
exists a subspace U of V such that 


UnnullT = {0} and rangeT = {Tu:u € U}. 
Suppose T is a linear map from F* to F* such that 
mall Tf (ei ty gt) Se Bt x, = ox end x, = 74). 
Prove that T is surjective. 


Suppose U is a three-dimensional subspace of R® and that T is a linear map 
from R® to R® such that null T = U. Prove that T is surjective. 


Prove that there does not exist a linear map from F° to F* whose null space 
equals { (4.33, Fa. Xa te) GF? 2x) = 3x, and a = yy = xh. 


Suppose there exists a linear map on V whose null space and range are both 
finite-dimensional. Prove that V is finite-dimensional. 


16 


17 


18 


19 


20 


21 


22 


23 


24 


25 


26 


27 
28 
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Suppose V and W are both finite-dimensional. Prove that there exists an 
injective linear map from V to W if and only if dim V < dim W. 


Suppose V and W are both finite-dimensional. Prove that there exists a 
surjective linear map from V onto W if and only if dim V > dim W. 


Suppose V and W are finite-dimensional and that U is a subspace of V. 
Prove that there exists T € £(V,W) such that nullT = U if and only if 
dim U > dim V — dim W. 


Suppose W is finite-dimensional and T € £(V, W). Prove that T is injective 
if and only if there exists S € L(W, V) such that ST is the identity operator 
on V. 


Suppose W is finite-dimensional and T € Z(V, W). Prove that T is surjective 
if and only if there exists S € £(W, V) such that TS is the identity operator 
on W. 


Suppose V is finite-dimensional, T € £(V,W), and U is a subspace of W. 
Prove that {v € V: Tv € U} is a subspace of V and 


dim{v € V: Tv € U} = dimnull T + dim(U n range T). 


Suppose U and V are finite-dimensional vector spaces and S € £(V, W) and 
T € Z(U, V). Prove that 


dim null ST < dimnull S + dim null T. 


Suppose U and V are finite-dimensional vector spaces and S € £(V, W) and 
T € Z(U, V). Prove that 


dim range ST < min{dim range S, dim range T}. 


(a) Suppose dim V = 5 and S,T € L(V) are such that ST = 0. Prove that 
dimrange TS < 2. 
(b) Give an example of S,T € £(F°) with ST = 0 and dimrange TS = 2. 


Suppose that W is finite-dimensional and S,T € L(V,W). Prove that 
null S C null T if and only if there exists E € £2(W) such that T = ES. 


Suppose that V is finite-dimensional and $,T € 4(V,W). Prove that 
range S C range T if and only if there exists E € £(V) such that S = TE. 


Suppose P € L(V) and P? = P. Prove that V = null P @ range P. 


Suppose D € £(P(R)) is such that deg Dp = (deg p) — 1 for every non- 
constant polynomial p € P(R). Prove that D is surjective. 
The notation D is used above to remind you of the differentiation map that 
sends a polynomial p to p’. 
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Suppose p € P(R). Prove that there exists a polynomial gq € P(R) such 
that 5q” + 3q' = p. 
This exercise can be done without linear algebra, but it’s more fun to do it 
using linear algebra. 


Suppose yg € L(V,F) and g # 0. Suppose u € V is not in null gy. Prove 
that 
V=nullg @ {au:a EF}. 


Suppose V is finite-dimensional, X is a subspace of V, and Y is a finite- 
dimensional subspace of W. Prove that there exists T € £(V, W) such that 
null T = X and range T = Y if and only if dim X + dim Y = dim V. 


Suppose V is finite-dimensional with dim V > 1. Show thatif gp: £(V) > F 
is a linear map such that y(ST) = (S)@(T) for all S,T € L(V), then 
pg =0. 

Hint: The description of the two-sided ideals of L(V) given by Exercise 17 

in Section 3A might be useful. 


Suppose that V and W are real vector spaces and T € £(V,W). Define 
Te(u+iv) = Tut+iTov 
for all u,v © V. 


(a) Show that Tc is a (complex) linear map from Vc to We. 
(b) Show that T¢ is injective if and only if T is injective. 
(c) Show that range T, = We if and only if range T = W. 


See Exercise 8 in Section 1B for the definition of the complexification Ve. 
The linear map Tg is called the complexification of the linear map T. 
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3C. Matrices 


Representing a Linear Map by a Matrix 


We know that if v,,...,v,, is a basis of V and T: V > Wis linear, then the values 
of Tv,,..., Tv,, determine the values of T on arbitrary vectors in V—see the linear 
map lemma (3.4). As we will soon see, matrices provide an efficient method of 
recording the values of the Tv,’s in terms of a basis of W. 


3.29 definition: matrix, Aj,k 


Suppose m and n are nonnegative integers. An m-by-n matrix A is a rectangular 
array of elements of F with m rows and n columns: 


Ayi ooo Ain 
A=| : : 
Aaa eine Ae 


The notation A; , denotes the entry in row j, column k of A. 


3.30 example: A; ; equals entry in row j, column k of A 


S A 2 8 4 5-3i 
BPPOPe ~ 19 7 * When dealing with matrices, the first 
Thus A, 3 refers to the entry in the sec- index refers to the row number; the sec- 
ond row, third column of A, which means ond index refers to the column number. 
that Ar,3 = 7. 


Now we come to the key definition in this section. 


3.31 definition: matrix of a linear map, M (T) 


Suppose T € £(V,W) and 7,,...,v,, is a basis of V and wy, ..., w,, is a basis 
of W. The matrix of T with respect to these bases is the m-by-n matrix M(T) 
whose entries A; , are defined by 


TU, = Ay ,W aE eee =} Jan Dre 


If the bases v1,...,v,, and w,,..., w,, are not clear from the context, then the 
notation M(T, (V1, ..-,0,), (W 1, +++; W),_)) is used. 


The matrix M(T) of a linear map T € £(V, W) depends on the basis 71, ..., 0, 
of V and the basis wy, ..., Ww, of W, as well as on T. However, the bases should be 
clear from the context, and thus they are often not included in the notation. 

To remember how M(T) is constructed from T, you might write across the 
top of the matrix the basis vectors v,,...,v,, for the domain and along the left the 
basis vectors w1, ...,W,, for the vector space into which T maps, as follows: 
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O71 eee OK eee Oy 
Wy Aik 
Wn Ain,k 


In the matrix above only the k" col- 
umn is shown. Thus the second index of 
each displayed entry of the matrix above 
is k. The picture above should remind you es 
that Tv, can be computed from M(T) by To, =D) Apne. 
multiplying each entry in the k"® column pod 
by the corresponding w, from the left col- 
umn, and then adding up the resulting 
vectors. 

If T is a linear map from F” to F” if T is @ linear map from an 
then unless stated otherwise, assume the —,,_dimensional vector space to an 
bases in question are the standard ones — jy-dimensional vector space, then 
(where the k"" basis vector is 1 in the k™ — y¢(T) is an m-by-n matrix. 
slot and 0 in all other slots). If you think 
of elements of F’” as columns of m numbers, then you can think of the k“* column 
of M(T) as T applied to the k" standard basis vector. 


3.32 example: the matrix of a linear map from F* to F° 


Suppose T € £(F*%, F*) is defined by 


The k*" column of M(T) consists of 
the scalars needed to write Tv, as a 
linear combination of W,,...,Win! 


T(x, y) = (x + 3y, 2x + Sy, 7x + 9y). 


Because T(1,0) = (1,2,7) and T(0,1) = (3,5,9), the matrix of T with respect 
to the standard bases is the 3-by-2 matrix below: 


1 3 
veer =( 2 s| 
7 9 


When working with ?,,,(F), use the standard basis 1, x, x2, ...,x”” unless the 
context indicates otherwise. 


3.33 example: matrix of the differentiation map from P3(R) to P2(R) 


Suppose D € £(P3(R), P(R)) is the differentiation map defined by Dp = p’. 
Because (x”)’ = nx"~1 the matrix of D with respect to the standard bases is the 
3-by-4 matrix below: 


010 0 
M(D)=} 0 0 2 0 }. 
00 0 3 
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Addition and Scalar Multiplication of Matrices 


For the rest of this section, assume that U, V, and W are finite-dimensional and 
that a basis has been chosen for each of these vector spaces. Thus for each linear 
map from V to W, we can talk about its matrix (with respect to the chosen bases). 

Is the matrix of the sum of two linear maps equal to the sum of the matrices of 
the two maps? Right now this question does not yet make sense because although 
we have defined the sum of two linear maps, we have not defined the sum of two 
matrices. Fortunately, the natural definition of the sum of two matrices has the 
right properties. Specifically, we make the following definition. 


3.34 definition: matrix addition 


The sum of two matrices of the same size is the matrix obtained by adding 
corresponding entries in the matrices: 


ZA ad cas) Nee Ci mee. 


ar : ; 
Cra ns Ce 


Agate oa th Cie 


Aly eet Gere vei Ape ata Cree 


In the next result, the assumption is that the same bases are used for all three 
linear maps S + T, S, and T. 


The verification of the result above follows from the definitions and is left to 
the reader. 

Still assuming that we have some bases in mind, is the matrix of a scalar times 
a linear map equal to the scalar times the matrix of the linear map? Again, the 
question does not yet make sense because we have not defined scalar multiplication 
on matrices. Fortunately, the natural definition again has the right properties. 


3.36 definition: scalar multiplication of a matrix 


The product of a scalar and a matrix is the matrix obtained by multiplying 
each entry in the matrix by the scalar: 


Sy ee meas Aaa 2 AA, 
A C q fe c 


A Mag eo 18 


Arn aap m,n m,n 
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3.37 example: addition and scalar multiplication of matrices 


of 3 1)\,(42)_f & 2 \),(42)_( 0 4 
-1 5 16) \ -—2 10 16) \ -1 16 
In the next result, the assumption is that the same bases are used for both the 

linear maps AT and T. 


3.38 the matrix of a scalar times a linear map 


Suppose A € FandT € £(V,W). Then M(AT) = AM(T). 


The verification of the result above is also left to the reader. 

Because addition and scalar multiplication have now been defined for matrices, 
you should not be surprised that a vector space is about to appear. First we 
introduce a bit of notation so that this new vector space has a name, and then we 
find the dimension of this new vector space. 


3.39 notation: F”"” 


For m and n positive integers, the set of all m-by-n matrices with entries in F 
is denoted by F”””. 


S40 dint hei 


Suppose m and 1 are positive integers. With addition and scalar multiplication 
defined as above, F”” is a vector space of dimension mn. 


Proof The verification that F’”’” is a vector space is left to the reader. Note that 
the additive identity of F”” is the m-by-n matrix all of whose entries equal 0. 

The reader should also verify that the list of distinct m-by-n matrices that have 
O in all entries except for a 1 in one entry is a basis of F””’”. There are mn such 
matrices, so the dimension of F””’” equals mn. 


Matrix Multiplication 


Suppose, as previously, that vj, ..., v,, is a basis of V and wy, ..., w,, is a basis of W. 
Suppose also that 111, ...,u, is a basis of U. 

Consider linear maps T: U > Vand S: V > W. The composition ST is a 
linear map from U to W. Does M(ST) equal M(S)M(T)? This question does 
not yet make sense because we have not defined the product of two matrices. We 
will choose a definition of matrix multiplication that forces this question to have 


a positive answer. Let’s see how to do this. 
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Suppose M(S) = A and M(T) = B. For1<k < p, we have 


(ST)u, = s( Y B,12r) 


m n 
a s ( > AyrBra J) 
j=l“ret 
Thus M(ST) is the m-by-p matrix whose entry in row j, column k, equals 
n 
¥. Ay By 
r=1 


Now we see how to define matrix multiplication so that the desired equation 
M (ST) = M(S)M(T) holds. 


3.41. definition: matrix multiplication 


Suppose A is an m-by-n matrix and B is an n-by-p matrix. Then AB is defined 
to be the m-by-p matrix whose entry in row j, column k, is given by the equation 


n 
(AB); x a ye rele: 
oll 


Thus the entry in row j, column k, of AB is computed by taking row j of A and 
column k of B, multiplying together corresponding entries, and then summing. 


Note that we define the product of You may have learned this definition 
two matrices only when the number of of matrix multiplication in an earlier 
columns of the first matrix equals the = ¢gyrse, although you may not have 
number of rows of the second matrix. seen this motivation for it. 


3.42 example: matrix multiplication 


Here we multiply together a 3-by-2 matrix and a 2-by-4 matrix, obtaining a 
3-by-4 matrix: 


i 1 7 41 
: eee 19 12 | 
5 6 42 31 20 9 


Matrix multiplication is not commutative—AB is not necessarily equal to 
BA even if both products are defined (see Exercise 10). Matrix multiplication is 
distributive and associative (see Exercises 11 and 12). 
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In the next result, we assume that the same basis of V is used in considering 
T € £(U,V) and S € L(V,W), the same basis of W is used in considering 
S € £(V,W) and ST € £(U, W), and the same basis of U is used in considering 
T € £(U, V) and ST € L(U, W). 


3.43 matrix of product of linear maps 


IfT € £(U,V) andS € L(V, W), then M(ST) = M(S)M(T). 


The proof of the result above is the calculation that was done as motivation 
before the definition of matrix multiplication. 

In the next piece of notation, note that as usual the first index refers to a row 
and the second index refers to a column, with a vertically centered dot used as a 
placeholder. 


3.44 notation: Aj. , A. « 


Suppose A is an m-by-n matrix. 


e If1 <j<™m, then A, . denotes the 1-by-n matrix consisting of row j of A. 


e If1<k <n, then A, denotes the m-by-1 matrix consisting of column k 
of A. 


3.45 example: Aj. equals j row of A and A. equals k" column of A 


The notation Ap. denotes the second row of A and A.» denotes the second 


column of A. Thus if A = ( aes ) then 


19 7 


4 

A,.=(1 9 7) and ae ) 

The product of a 1-by-n matrix and an n-by-1 matrix is a 1-by-1 matrix. How- 
ever, we will frequently identify a 1-by-1 matrix with its entry. For example, 


6 


(3 4)( 5 )=( 26) 


because 3-6 + 4-2 = 26. However, we can identify ( 26 ) with 26, writing 
6 
(3 4 \( : ) =p 
The next result uses the convention discussed in the paragraph above to give 
another way to think of matrix multiplication. For example, the next result and 
the calculation in the paragraph above explain why the entry in row 2, column 1, 
of the product in Example 3.42 equals 26. 
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3.46 entry of matrix product equals row times column 


Suppose A is an m-by-n matrix and B is an n-by-p matrix. Then 


(AB), x = Aj,. Bx 


if1 <j<mand1<k <p. In other words, the entry in row j, column k, of 
AB equals (row j of A) times (column k of B). 


Proof Suppose 1 <j <mand1<k < p. The definition of matrix multiplication 
states that 


3.47 (AB); = Aj 1B g + + Aj nB 


jn?n,k: 
The definition of matrix multiplication also implies that the product of the 1-by-n 
matrix A; and the n-by-1 matrix B.; is the 1-by-1 matrix whose entry is the 
number on the right side of the equation above. Thus the entry in row j, column k, 
of AB equals (row j of A) times (column k of B). 


The next result gives yet another way to think of matrix multiplication. In the 
result below, (AB). ;, is column k of the m-by-p matrix AB. Thus (AB), ; is an 
m-by-1 matrix. Also, AB. is an m-by-1 matrix because it is the product of an 
m-by-n matrix and an n- 6 1 matrix. Thus the two sides of the equation in the 
result below have the same size, making it reasonable that they might be equal. 


3.48 column of matrix product equals matrix times column 


Suppose A is an m-by-n matrix and B is an n-by-p matrix. Then 


(AB). , = AB, 


if 1 <k <p. In other words, column k of AB equals A times column k of B. 


Proof As discussed above, (AB). ; and AB, are both m-by-1 matrices. If 1 < 
j <m, then the entry in row j of (AB). , is the left side of 3.47 and the entry in 
row j of AB. , is the right side of 3.47. Thus (AB), = AB. ,. 


Our next result will give another way of thinking about the product of an 
m-by-n matrix and an n-by-1 matrix, motivated by the next example. 


3.49 example: product of a 3-by-2 matrix and a 2-by-1 matrix 


Use our definitions and basic arithmetic to verify that 


Gisele: 


Thus in this example, the product of a 3-by-2 matrix and a 2-by-1 matrix is a 
linear combination of the columns of the 3-by-2 matrix, with the scalars (5 and 1) 
that multiply the columns coming from the 2-by-1 matrix. 
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The next result generalizes the example above. 


3.50 linear combination of columns 


by 


Suppose A is an m-by-n matrix and b = | 


| is an n-by-1 matrix. Then 
by, 


AD = OA tee DA 


In other words, Ab is a linear combination of the columns of A, with the 
scalars that multiply the columns coming from b. 


Proof Ifk € {1,...,m}, then the definition of matrix multiplication implies that 
the entry in row k of the m-by-1 matrix Ab is 


Agi, athe Pape yell 


n° 


The entry in row k of b, A. , +--+ +b, A_,,, also equals the number displayed above. 
Because Ab and b,A,, +--+» + b, A. have the same entry in row k for each 
k € {1,...,m}, we conclude that Ab = b,A_, +++ +b, Ay. 


Our two previous results focus on the columns of a matrix. Analogous results 
hold for the rows of a matrix. Specifically, see Exercises 8 and 9, which can be 
proved using appropriate modifications of the proofs of 3.48 and 3.50. 

The next result is the main tool used in the next subsection to prove the 
column-—row factorization (3.56) and to prove that the column rank of a matrix 
equals the row rank (3.57). To be consistent with the notation often used with the 
column-—row factorization, including in the next subsection, the matrices in the 
next result are called C and R instead of A and B. 


3.51 matrix multiplication as linear combinations of columns 


Suppose C is an m-by-c matrix and R is a c-by-n matrix. 


(a) Ifk € ({1,...,n}, then column k of CR is a linear combination of the 
columns of C, with the coefficients of this linear combination coming 
from column k of R. 


(b) Ifj € {1,..., m}, then row j of CR is a linear combination of the rows of R, 
with the coefficients of this linear combination coming from row j of C. 


Proof Suppose k € {1,...,2}. Then column k of CR equals CR. , (by 3.48), 
which equals the linear combination of the columns of C with coefficients coming 
from R_; (by 3.50). Thus (a) holds. 

To prove (b), follow the pattern of the proof of (a) but use rows instead of 
columns and use Exercises 8 and 9 instead of 3.48 and 3.50. 
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Column—Row Factorization and Rank of a Matrix 


We begin by defining two nonnegative integers associated with each matrix. 


3.52 definition: column rank, row rank 


Suppose A is an m-by-n matrix with entries in F. 


e The column rank of A is the dimension of the span of the columns of A 
in FP 1 


e The row rank of A is the dimension of the span of the rows of A in Fl”. 


If A is an m-by-n matrix, then the column rank of A is at most 1 (because A has 
n columns) and the column rank of A is also at most m (because dim F”’! = m). 
Similarly, the row rank of A is also at most min{m, n}. 


3.53 example: column rank and row rank of a 2-by-4 matrix 


Suppose 
4 7 1 8 
ie ( 3.5 2 9 ) 


The column rank of A is the dimension of 


wo((S} GC) 


in F*-! Neither of the first two vectors listed above in F*! is a scalar multiple of 
the other. Thus the span of this list of length four has dimension at least two. The 
span of this list of vectors in F**! cannot have dimension larger than two because 
dim F*! = 2. Thus the span of this list has dimension two, which means that the 
column rank of A is two. 

The row rank of A is the dimension of 


span(( 4 7 A Ba 3. a 2 9 )) 


in F'-+ Neither of the two vectors listed above in F'\* is a scalar multiple of the 
other. Thus the span of this list of length two has dimension two, which means 
that the row rank of A is two. 


We now define the transpose of a matrix. 


3.54 definition: transpose, A‘ 


The transpose of a matrix A, denoted by A’, is the matrix obtained from A by 
interchanging rows and columns. Specifically, if A is an m-by-n matrix, then 
A' is the n-by-m matrix whose entries are given by the equation 


(A), j = ane 
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3.55 example: transpose of a matrix 


5 -7 

a= 3 8 Joaenar= (5 : es ) 
-4 2 

Note that here A is a 3-by-2 matrix and A' is a 2-by-3 matrix. 


The transpose has nice algebraic properties: (A + B)' = A'+ BY (AA) = AA' 
and (AC)! = C'A' for all m-by-n matrices A, B, all A € F, and all n-by-p matrices 
C (see Exercises 14 and 15). 

The next result will be the main tool used to prove that the column rank equals 
the row rank (see 3.57). 


3.56 column—row factorization 


Suppose A is an m-by-n matrix with entries in F and column rank c > 1. Then 


there exist an m-by-c matrix C and a c-by-n matrix R, both with entries in F, 
such that A = CR. 


Proof Hach column of A is an m-by-1 matrix. The list A_j,...,A_,, of columns 
of A can be reduced to a basis of the span of the columns of A (by 2.30). This 
basis has length c, by the definition of the column rank. The c columns in this 
basis can be put together to form an m-by-c matrix C. 

If k € {1,...,1}, then column k of A is a linear combination of the columns 
of C. Make the coefficients of this linear combination into column k of a c-by-n 
matrix that we call R. Then A = CR, as follows from 3.51(a). 


In Example 3.53, the column rank and row rank turned out to equal each other. 
The next result states that this happens for all matrices. 


3.57 column rank equals row rank 
Suppose A € F””. Then the column rank of A equals the row rank of A. 


Proof Let c denote the column rank of A. Let A = CR be the column-row 
factorization of A given by 3.56, where C is an m-by-c matrix and R is a c-by-n 
matrix. Then 3.51(b) tells us that every row of A is a linear combination of the 
rows of R. Because R has c rows, this implies that the row rank of A is less than 
or equal to the column rank c of A. 

To prove the inequality in the other direction, apply the result in the previous 
paragraph to A‘, getting 


column rank of A = row rank of A' 
< column rank of A! 


= row rank of A. 


Thus the column rank of A equals the row rank of A. 
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Because the column rank equals the row rank, the last result allows us to 
dispense with the terms “column rank” and “row rank” and just use the simpler 
term “rank”. 


3.58 definition: rank 


The rank of a matrix A € F”” is the column rank of A. 


See 3.133 and Exercise 8 in Section 7A for alternative proofs that the column 
rank equals the row rank. 


Exercises 3C 


1 Suppose T € £(V, W). Show that with respect to each choice of bases of V 
and W, the matrix of T has at least dim range T nonzero entries. 


2 Suppose V and W are finite-dimensional and T € L(V,W). Prove that 
dim range T = 1 if and only if there exist a basis of V and a basis of W such 
that with respect to these bases, all entries of (T) equal 1. 


3 Suppose v,,...,v,, is a basis of V and wy,..., w,, is a basis of W. 


(a) Show that if S,T © 2(V,W), then M(S + T) = M(S)+ M(T). 
(b) Show that if A © F and T € L(V, W), then M(AT) = AM (T). 


This exercise asks you to verify 3.35 and 3.38. 


4 Suppose that D € £(P3(R), P>(R)) is the differentiation map defined by 
Dp = p’. Find a basis of ?3(R) and a basis of ?(R) such that the matrix of 
D with respect to these bases is 


1 0 0 0 
0 10 0 4}. 
001 0 


Compare with Example 3.33. The next exercise generalizes this exercise. 


5 Suppose V and W are finite-dimensional and T € £(V, W). Prove that there 
exist a basis of V and a basis of W such that with respect to these bases, all 
entries of f(T) are 0 except that the entries in row k, column k, equal 1 if 
1<k < dimrange T. 


6 Suppose v,...,v,, is a basis of V and W is finite-dimensional. Suppose 
T € £(V,W). Prove that there exists a basis wy ,...,w,, of W such that all 
entries in the first column of M(T) [with respect to the bases v,...,v,, and 
W ,...,W,] are 0 except for possibly a 1 in the first row, first column. 

In this exercise, unlike Exercise 5, you are given the basis of V instead of 
being able to choose a basis of V. 
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Suppose Wj, ...,w,, is a basis of W and V is finite-dimensional. Suppose 
T € £(V,W). Prove that there exists a basis vj,...,v,, of V such that all 
entries in the first row of (T) [with respect to the bases v,,...,v,, and 
W ,..., W,] are 0 except for possibly a 1 in the first row, first column. 
In this exercise, unlike Exercise 5, you are given the basis of W instead of 
being able to choose a basis of W. 


Suppose A is an m-by-n matrix and B is an n-by-p matrix. Prove that 
(AB), =A,,.B 
for each 1 < j < m. In other words, show that row j of AB equals (row j of A) 


times B. 


This exercise gives the row version of 3.48. 


Suppose a = (4, + 4a, ) isa 1-by-n matrix and B is an n-by-p matrix. 
Prove that 
aB = a,B,. +--+ +4,B 


Wo nes * 
In other words, show that aB is a linear combination of the rows of B, with 
the scalars that multiply the rows coming from a. 


This exercise gives the row version of 3.50. 
Give an example of 2-by-2 matrices A and B such that AB # BA. 


Prove that the distributive property holds for matrix addition and matrix 
multiplication. In other words, suppose A, B, C, D, E, and F are matrices 
whose sizes are such that A(B + C) and (D + E)F make sense. Explain why 
AB + AC and DF + EF both make sense and prove that 


A(B+C)=AB+AC and (D+E)F=DF+EF. 


Prove that matrix multiplication is associative. In other words, suppose A, B, 
and C are matrices whose sizes are such that (AB)C makes sense. Explain 
why A(BC) makes sense and prove that 


(AB)C = A(BC). 


Try to find a clean proof that illustrates the following quote from Emil Artin: 
“It is my experience that proofs involving matrices can be shortened by 50% 
if one throws the matrices out.” 


Suppose A is an n-by-n matrix and 1 < j,k < n. Show that the entry in 
row j, column k, of A? (which is defined to mean AAA) is 


Suppose m and n are positive integers. Prove that the function A + A‘isa 
linear map from F”*” to F””. 


15 
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Prove that if A is an m-by-n matrix and C is an n-by-p matrix, then 


(AC)' = CtAL 
This exercise shows that the transpose of the product of two matrices is the 
product of the transposes in the opposite order. 


16 Suppose A is an m-by-n matrix with A # 0. Prove that the rank of A is 1 


17 


if and only if there exist (c;,...,c,,) © F” and (dj,...,d,,) € F” such that 


Aj,« = ca, for every j = 1,...,m and every k = 1,...,n. 


Suppose T € £(V), and u),...,u,, and vj,...,V,, are bases of V. Prove that 
the following are equivalent. 

(a) Tis injective. 

(b) The columns of 1 (T) are linearly independent in F”+, 

(c) The columns of M(T) span F”1. 

(d) The rows of M(T) span Fl”. 

(e) The rows of M(T) are linearly independent in Fb”. 


Here (T) means M(T, (Uy, .--, Uy) (01, 5 Un): 
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Invertible Linear Maps 


We begin this section by defining the notions of invertible and inverse in the 
context of linear maps. 


e A linear map T € £(V, W) is called invertible if there exists a linear map 
S € £(W,V) such that ST equals the identity operator on V and TS equals 


the identity operator on W. 


e A linear map S € L(W,V) satisfying ST = I and TS = J is called an 
inverse of T (note that the first I is the identity operator on V and the second 
Tis the identity operator on W). 


The definition above mentions “‘an inverse’. However, the next result shows 
that we can change this terminology to “the inverse”. 


Proof Suppose T € £(V, W) is invertible and S, and S, are inverses of T. Then 
Sy = Syl = S,(TS>) = (S,T)S5 = IS, = So. 
Thus Sy = So. 


Now that we know that the inverse is unique, we can give it a notation. 


If T is invertible, then its inverse is denoted by T~!. In other words, if 


T € L(V,W) is invertible, then T~! is the unique element of £(W, V) such 
that T-'T = Iand TT! =1. 


3.62 example: inverse of a linear map from R° to R° 


Suppose T € L(R°) is defined by T(x,y,z) = (—y,x,4z). Thus T is a 
counterclockwise rotation by 90° in the xy-plane and a stretch by a factor of 4 in 
the direction of the z-axis. 

Hence the inverse map T~! € L(R?) is the clockwise rotation by 90° in the 
xy-plane and a stretch by a factor of i in the direction of the z-axis: 


T-l(x,y,z) = (y, -x, qZ)- 
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The next result shows that a linear map is invertible if and only if it is one-to- 
one and onto. 


3.63 invertibility <= injectivity and surjectivity 


A linear map is invertible if and only if it is injective and surjective. 


Proof Suppose T € £(V,W). We need to show that T is invertible if and only 
if it is injective and surjective. 
First suppose T is invertible. To show that T is injective, suppose u,v € V 
and Tu = Tv. Then 
u=T (Tu) =T-l(Tv) =2, 


so u = v. Hence T is injective. 

We are still assuming that T is invertible. Now we want to prove that T is 
surjective. To do this, let w € W. Then w = T(T~!w), which shows that w is 
in the range of T. Thus rangeT = W. Hence T is surjective, completing this 
direction of the proof. 

Now suppose T is injective and surjective. We want to prove that T is invertible. 
For each w € W, define S(w) to be the unique element of V such that T(S(w)) = w 
(the existence and uniqueness of such an element follow from the surjectivity and 
injectivity of T). The definition of S implies that T o S equals the identity operator 
on W. 

To prove that S o T equals the identity operator on V, let v € V. Then 


T((So T)v) = (T° S)(Tv) = (Tv) = Tv. 


This equation implies that (So T)v = v (because T is injective). Thus S o T equals 
the identity operator on V. 

To complete the proof, we need to show that S is linear. To do this, suppose 
W1, W2 € W. Then 


T(S(w1) + S(w2)) = T(S(w1)) + T(S(W2)) = Wy + W. 


Thus S(w,) + S(w2) is the unique element of V that T maps to w, + wz. By the 
definition of S, this implies that S(w, + wz) = S(w,) + S(w2). Hence S satisfies 
the additive property required for linearity. 

The proof of homogeneity is similar. Specifically, if w € W and A € F, then 


T(AS(w)) = AT(S(w)) = Aw. 


Thus AS(w) is the unique element of V that T maps to Aw. By the definition of S, 
this implies that S(Aw) = AS(w). Hence S is linear, as desired. 


For a linear map from a vector space to itself, you might wonder whether 
injectivity alone, or surjectivity alone, is enough to imply invertibility. On infinite- 
dimensional vector spaces, neither condition alone implies invertibility, as illus- 
trated by the next example, which uses two familiar linear maps from Example 3.3. 


84 Chapter 3 Linear Maps 


3.64 example: neither injectivity nor surjectivity implies invertibility 


e The multiplication by x? linear map from P (R) to P(R) (see 3.3) is injective 
but it is not invertible because it is not surjective (the polynomial 1 is not in 
the range). 


e The backward shift linear map from F® to F~ (see 3.3) is surjective but it is 
not invertible because it is not injective [the vector (1, 0,0, 0, ...) is in the null 
space]. 


In view of the example above, the next result is remarkable—it states that for 
a linear map from a finite-dimensional vector space to a vector space of the same 
dimension, either injectivity or surjectivity alone implies the other condition. 
Note that the hypothesis below that dim V = dim W is automatically satisfied in 
the important special case where V is finite-dimensional and W = V. 


3.65 injectivity is equivalent to surjectivity (if dimV = dimW < oo) 


Suppose that V and W are finite-dimensional vector spaces, dim V = dim W, 
and T € L(V,W). Then 


T is invertible <=» T is injective <=» T is surjective. 


Proof The fundamental theorem of linear maps (3.21) states that 
3.66 dim V = dim null T + dim range T. 


If T is injective (which by 3.15 is equivalent to the condition dim null T = 0), 
then the equation above implies that 


dimrange T = dim V — dimnullT = dim V = dimW, 


which implies that T is surjective (by 2.39). 
Conversely, if T is surjective, then 3.66 implies that 


dim null T = dim V — dimrange T = dim V — dimW = 0, 


which implies that T is injective. 

Thus we have shown that T is injective if and only if T is surjective. Thus if 
T is either injective or surjective, then T is both injective and surjective, which 
implies that T is invertible. Hence T is invertible if and only if T is injective if 
and only if T is surjective. 


The next example illustrates the power of the previous result. Although it is 
possible to prove the result in the example below without using linear algebra, the 
proof using linear algebra is cleaner and easier. 
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3.67 example: there exists a polynomial p such that ((x* + 5x + 7)p)" =4q 


The linear map 
pre ((x2+5x+7)p)" 


from ?(R) to itself is injective, as you can show. Thus we are tempted to use 3.65 
to show that this map is surjective. However, Example 3.64 shows that the magic 
of 3.65 does not apply to the infinite-dimensional vector space P(R). We will 
get around this problem by restricting attention to the finite-dimensional vector 
space P,,,(R). 

Suppose q € P(R). There exists a nonnegative integer m such that g € ?,,,(R). 
Define T: P,,,(R) > P,,,(R) by 


Tp = ((x? + 5x +7)p)". 


Multiplying a nonzero polynomial by (x? + 5x +7) increases the degree by 2, and 
then differentiating twice reduces the degree by 2. Thus T is indeed a linear map 
from P,,,(R) to itself. 

Every polynomial whose second derivative equals 0 is of the form ax + D, 
where a,b € R. Thus null T = {0}. Hence T is injective. 

Thus T is surjective (by 3.65), which means that there exists a polynomial 
p & P,,,(R) such that ((x2+5x+7)p)” = q, as claimed in the title of this example. 


Exercise 35 in Section 6A gives a similar but more spectacular example of 
using 3.65. 

The hypothesis in the result below that dim V = dim W holds in the important 
special case in which V is finite-dimensional and W = V. Thus in that case, the 
equation ST = I implies that ST = TS, even though we do not have multiplicative 
commutativity of arbitrary linear maps from V to V. 


3.68 ST =I <—» TS =I (on vector spaces of the same dimension) 


Suppose V and W are finite-dimensional vector spaces of the same dimension, 
SEL(V,W), andT € L(W,V). Then ST = Jif and only if TS = I. 


Proof First suppose ST = I. If v € V and Tv = 0, then 
v = Iv = (ST)v = S(Tv) = S(0) = 0. 


Thus T is injective (by 3.15). Because V and W have the same dimension, this 
implies that T is invertible (by 3.65). 
Now multiply both sides of the equation ST = I by T~! on the right, getting 


ce ie 


Thus TS = TT~! = I, as desired. 

To prove the implication in the other direction, simply reverse the roles of S 
and T (and V and W) in the direction we have already proved, showing that if 
TS =I, then ST =I. 
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Isomorphic Vector Spaces 


The next definition captures the idea of two vector spaces that are essentially the 
same, except for the names of their elements. 


3.69. definition: isomorphism, isomorphic 


e An isomorphism is an invertible linear map. 


e Two vector spaces are called isomorphic if there is an isomorphism from 
one vector space onto the other one. 


Think of an isomorphism T: V > Was relabeling v € Vas Tv € W. This 
viewpoint explains why two isomorphic vector spaces have the same vector space 
properties. The terms “isomorphism” and “invertible linear map” mean the same 
thing. Use “isomorphism” when you want to emphasize that the two spaces are 
essentially the same. 

It can be difficult to determine whether two mathematical structures (such as 
groups or topological spaces) are essentially the same, differing only in the names 
of the elements of underlying sets. However, the next result shows that we need 
to look at only a single number (the dimension) to determine whether two vector 
spaces are isomorphic. 


3.70 dimension shows whether vector spaces are isomorphic 


Two finite-dimensional vector spaces over F are isomorphic if and only if they 
have the same dimension. 


Proof First suppose V and W are isomorphic finite-dimensional vector spaces. 
Thus there exists an isomorphism T from V onto W. Because T is invertible, we 
have null T = {0} and range T = W. Thus 


dimnullT =0 and dimrange T = dim W. 


The formula 
dim V = dim null T + dim range T 


(the fundamental theorem of linear maps, which is 3.21) thus becomes the equation 
dim V = dim W, completing the proof in one direction. 

To prove the other direction, suppose V and W are finite-dimensional vector 
spaces of the same dimension. Let vj, ...,v,, be a basis of V and wy,...,w,, bea 
basis of W. Let T € L(V, W) be defined by 


T (C101 + + Cy0,) = CW, +o $C, Wye 


Then T is a well-defined linear map because 7,...,v,, is a basis of V. Also, T 
is surjective because w,,...,w,, spans W. Furthermore, nullT = {0} because 
W4,...,W, is linearly independent. Thus T is injective. Because T is injective and 
surjective, it is an isomorphism (see 3.63). Hence V and W are isomorphic. 
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The previous result implies that each 
finite-dimensional vector space V is iso- 
morphic to F", where n = dimV. For 
example, if m is a nonnegative integer, 
then P,,(F) is isomorphic to F™*1. 

Recall that the notation F”’” denotes 
the vector space of m-by-n matrices with 
entries in F. If 7j,...,v,, is a basis of V 
and w,,...,W,, is a basis of W, then for 
each T € L(V,W), we have a matrix 
M(T) € F”" Thus once bases have 
been fixed for V and W, M becomes a 
function from £(V,W) to F”’" Notice 
that 3.35 and 3.38 show that MV is a lin- 
ear map. This linear map is actually an 
isomorphism, as we now show. 
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Every finite-dimensional vector space 
is isomorphic to some F". Thus why not 
Just study F" instead of more general 
vector spaces? To answer this ques- 
tion, note that an investigation of F" 
would soon lead to other vector spaces. 
For example, we would encounter the 
null space and range of linear maps. 
Although each of these vector spaces 
is isomorphic to some F", thinking of 
them that way often adds complexity 
but no new insight. 


3.71 L(V,W) and F™" are isomorphic 


Suppose vj, ...,,, is a basis of V and wy... 
an isomorphism between £(V,W) and F””. 


., W,, is a basis of W. Then M is 


Proof 
and surjective. 


We already noted that M is linear. We need to prove that M is injective 


We begin with injectivity. If T € 2(V,W) and M(T) = 0, then Tv, = 0 for 
each k = 1,...,n. Because 7j,...,v,, is a basis of V, this implies T = 0. Thus 
is injective (by 3.15). 

To prove that M is surjective, suppose A € F””. By the linear map lemma 
(3.4), there exists T € L(V, W) such that 


m 
TU, = > Aj .W; 
j=l 


for each k = 1,...,n. Because M(T) equals A, the range of M equals F””, as 
desired. 


Now we can determine the dimension of the vector space of linear maps from 
one finite-dimensional vector space to another. 


3.72 dim Z(V,W) = (dim V) (dim W) 


Suppose V and W are finite-dimensional. Then 2(V, W) is finite-dimensional 


and 
dim L(V, W) = (dim V)(dim W). 


Proof ‘The desired result follows from 3.71, 3.70, and 3.40. 
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Linear Maps Thought of as Matrix Multiplication 


Previously we defined the matrix of a linear map. Now we define the matrix of a 
vector. 


3.73 definition: matrix of a vector, M (v) 


Suppose v € V and 7,..., v,, is a basis of V. The matrix of v with respect to 
this basis is the n-by-1 matrix 


by 
aeo=( : } 
lo 


where b,,...,b,, are the scalars such that 


0= by, apes a DE One 


The matrix (v) of a vector v € V depends on the basis v,,...,v,, of V, as 
well as on v. However, the basis should be clear from the context and thus it is 


not included in the notation. 


3.74 example: matrix of a vector 


e The matrix of the polynomial 2 — 7x + 5x? + x* with respect to the standard 
basis of P4(R) is 
2 
—7 


0 
5 
1 


e The matrix of a vector x € F” with respect to the standard basis is obtained by 
writing the coordinates of x as the entries in an n-by-1 matrix. In other words, 


if x = (X4,...,%,,) © F” then 
xy 
M(x) = > oY. 
Xn 


Occasionally we want to think of elements of V as relabeled to be n-by-1 
matrices. Once a basis 7}, ...,v, is chosen, the function M that takes v € V to 
M (v) is an isomorphism of V onto F”! that implements this relabeling. 

Recall that if A is an m-by-n matrix, then A, , denotes the k' column of A, 
thought of as an m-by-1 matrix. In the next result, W(Tv;,) is computed with 
respect to the basis wy, ...,w,, of W. 
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B75) DT = M(To;). 


Suppose T € £(V,W) and 7,,..., v,, is a basis of V and 7, ..., w,, is a basis 
of W. Let 1 < k <n. Then the k column of M(T), which is denoted by 
M(T).,, equals M(To,). 


Proof The desired result follows immediately from the definitions of (T) and 
M (To;). 


The next result shows how the notions of the matrix of a linear map, the matrix 
of a vector, and matrix multiplication fit together. 


3.76 linear maps act like matrix multiplication 


Suppose T € L(V,W) and v ©€ V. Suppose 7,...,v,, is a basis of V and 


W4,.--,W,, is a basis of W. Then 


M(Tv) = M(T)M (ov). 


Proof Suppose v = b,v, + «++ + b,,v,, where b,,...,b,, € F. Thus 


nn? 
3.77 Tv = b,Tv, + +++ +b,T0,. 
Hence 


M (Tv) = b,M(Tv,) +++ +b,M(To,) 
= b,M(T).4 Bett BNET). x 
= M(T)M(0), 


where the first equality follows from 3.77 and the linearity of 1, the second 
equality comes from 3.75, and the last equality comes from 3.50. 


Each m-by-n matrix A induces a linear map from F”! to F”:, namely the 
matrix multiplication function that takes x € F"! to Ax € F""1. The result above 
can be used to think of every linear map (from a finite-dimensional vector space 
to another finite-dimensional vector space) as a matrix multiplication map after 
suitable relabeling via the isomorphisms given by M. Specifically, if T € Z(V, W) 
and we identify v € V with M(v) € F”+ then the result above says that we can 
identify Tv with M(T)M (0). 

Because the result above allows us to think (via isomorphisms) of each linear 
map as multiplication on F”! by some matrix A, keep in mind that the specific 
matrix A depends not only on the linear map but also on the choice of bases. One 
of the themes of many of the most important results in later chapters will be the 
choice of a basis that makes the matrix A as simple as possible. 

In this book, we concentrate on linear maps rather than on matrices. However, 
sometimes thinking of linear maps as matrices (or thinking of matrices as linear 
maps) gives important insights that we will find useful. 


90 Chapter 3 Linear Maps 


Notice that no bases are in sight in the statement of the next result. Although 
M (T) in the next result depends on a choice of bases of V and W, the next result 
shows that the column rank of (T) is the same for all such choices (because 
range T does not depend on a choice of basis). 


3.78 dimension of range T equals column rank of M(T) 


Suppose V and W are finite-dimensional and T € £(V, W). Then dim range T 
equals the column rank of M(T). 


Proof Suppose v,,...,v,, is a basis of V and wy, ...,w,, is a basis of W. The linear 
map that takes w € W to M(w) is an isomorphism from W onto the space F”"! 
of m-by-1 column vectors. The restriction of this isomorphism to range T [which 
equals span(Tvj,..., Tv,,) by Exercise 10 in Section 3B] is an isomorphism from 
range T onto span(M(Tv,),..., M(Tv,)). For each k € {1,...,2}, the m-by-1 
matrix M(Tv,) equals column k of M(T). Thus 


dimrange T = the column rank of M(T), 


as desired. 


Change of Basis 
In Section 3C we defined the matrix 
MT, (04, 0105 Vy)s (Wy +015 Wm) ) 


of a linear map T from V to a possibly different vector space W, where 7, ...,0,, 
is a basis of V and wy, ..., w,, is a basis of W. For linear maps from a vector space 
to itself, we usually use the same basis for both the domain vector space and the 
target vector space. When using a single basis in both capacities, we often write 
the basis only once. In other words, if T € Z(V) and %,...,v,, is a basis of V, 
then the notation M(T, (v1, ...,0,)) is defined by the equation 


M(T, (04, 050,)) = MCT, (04, 0650y4)5 (Oy, 0050): 


If the basis v,,...,v,, is clear from the context, then we can write just M(T). 


3.79 definition: identity matrix, I 


Suppose 7 is a positive integer. The n-by-n matrix 


on 


with 1’s on the diagonal (the entries where the row number equals the column 
number) and 0’s elsewhere is called the identity matrix and is denoted by I. 


Section 3D _Invertibility and Isomorphisms 91 


In the definition above, the 0 in the lower left corner of the matrix indicates that 
all entries below the diagonal are 0, and the 0 in the upper right corner indicates 
that all entries above the diagonal are 0. 

With respect to each basis of V, the matrix of the identity operator | € £(V) 
is the identity matrix J. Note that the symbol I is used to denote both the identity 
operator and the identity matrix. The context indicates which meaning of I is 
intended. For example, consider the equation M (I) = I; on the left side I denotes 
the identity operator, and on the right side I denotes the identity matrix. 

If A is a square matrix (with entries in F, as usual) of the same size as I, then 
AI = IA = A, as you should verify. 


3.80 definition: invertible, inverse, A~! 


A square matrix A is called invertible if there is a square matrix B of the same 
size such that AB = BA = I; we call B the inverse of A and denote it by Ana 


The same proof as used in 3.60 shows Some mathematicians use the terms 
that if A is an invertible square matrix, nonsingular and singular, which 
then there is a unique matrix B such that — jean the same as invertible and non- 
AB = BA = I (and thus the notation — jnyertible. 

B = A7| is justified). 
If A is an invertible matrix, then (Ay = A because 


AA =AA?! =I. 


Also, if A and C are invertible square matrices of the same size, then AC is 
invertible and (AC)~! = C-1A-! because 
(AC)(C71A71) = A(CC71)A7 

= AIA“! 

= AA! 

=I, 
and similarly (C-'A~!) (AC) = 1. 

The next result holds because we defined matrix multiplication to make it 


true—see 3.43 and the material preceding it. Now we are just being more explicit 
about the bases involved. 


3.81 matrix of product of linear maps 


Suppose T € £(U, V) andS € L(V, W). Ifuy,..., u,, isa basis of U, 04, ..., 0, 


is a basis of V, and w,..., w, isa basis of W, then 


M (ST, (Uy, -+) Um), (W1, sey = 
MS, (Pics. com Oo rg A ee (Dig ccc Bia )\o (Bio coos Oa) Nc 
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The next result deals with the matrix of the identity operator I with respect 
to two different bases. Note that the k column of M(I, (ty, +45 Uj)s (Ops +5 Un)) 
consists of the scalars needed to write u, as a linear combination of the basis 
D415 +065 Uys 

In the statement of the next result, I denotes the identity operator from V to V. 
In the proof, I also denotes the n-by-n identity matrix. 


3.82 matrix of identity operator with respect to two bases 


Suppose that u,,...,u,, and v,,...,v,, are bases of V. Then the matrices 


MCL Biocon Beds (io coos @,)) einl IMCL (@io cam Os Was coon) 


are invertible, and each is the inverse of the other. 


Proof In 3.81, replace w, with u,, and replace S and T with I, getting 
T= ML, (04, 0050), Uy, es Uy) IMCL, (Uy, oes Up) s (O15 02 Op) 
Now interchange the roles of the u’s and v’s, getting 
T= ML, (uy, ---, Uy), (O45 +005 On) MCL, (04, 0s Oy)s (Uys ee Uy))- 


These two equations above give the desired result. 


3.83 example: matrix of identity on F* with respect to two bases 


Consider the bases (4,2), (5,3) and (1,0), (0,1) of F£ Because I(4,2) = 
4(1,0) + 2(0,1) and I(5,3) = 5(1,0) + 3(0,1), we have 


45 
M (I, ((4,2), (5,3), (1,0), (0,1))) = ( 2 3 ) 


The inverse of the matrix above is 


3 5 
2 ma) 
oy ey oe 


as you should verify. Thus 3.82 implies that 


3 
DAI (1.0). (0,)4 (4,246.3) = ( “2 } 


| 
Nig 


Our next result shows how the matrix of T changes when we change bases. In 
the next result, we have two different bases of V, each of which is used as a basis for 
the domain space and as a basis for the target space. Recall our shorthand notation 
that allows us to display a basis only once when it is used in both capacities: 


M(T, (Uy, -5Uy)) = MCT, (Uy, 02 Uy), (Uy, ves Uy))- 
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3.84 change-of-basis formula 


Suppose T € £(V). Suppose uw, ...,u,, and v1,...,v, are bases of V. Let 


A= ML, Cho can ti) aia 13 = IVA, (Gis cn @,))) 
and C = M(I, (uy,...,U,,), (V4, +++ 0,,)). Then 
A =C7!BC. 


Proof In 3.81, replace w, with u, and replace S with I, getting 
3.85 AS OMT he ey Wipe) 


19 UY 


where we have used 3.82. 
Again use 3.81, this time replacing w, with v,. Also replace T with I and 
replace S with T, getting 


M (T, (Uy, +05 Uy), (015 +015 0n)) = BC. 
Substituting the equation above into 3.85 gives the equation A = C~'BC. 
The proof of the next result is left as an exercise. 
3.86 matrix of inverse equals inverse of matrix 


Suppose that v,,...,v,, is a basis of V and T € L(V) is invertible. Then 


M(T-!) = (M(T))', where both matrices are with respect to the basis 
Dhig coon Ore 


Exercises 3D 


1. Suppose T € L(V, W) is invertible. Show that T~! is invertible and 
(Chee ee 


2 Suppose T € £(U,V) and S € L(V, W) are both invertible linear maps. 
Prove that ST € £(U, W) is invertible and that (ST)~! = T-!s-1. 


3 Suppose V is finite-dimensional and T € £(V). Prove that the following 
are equivalent. 
(a) T is invertible. 
(b) Tv,,..., Tv, is a basis of V for every basis v,,...,0, of V. 
(c) Tv,,..., Tv, is a basis of V for some basis v1, ...,v,, of V. 


4 Suppose V is finite-dimensional and dimV > 1. Prove that the set of 
noninvertible linear maps from V to itself is not a subspace of Z(V). 


10 


11 


12 


13 


14 


15 


16 
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Suppose V is finite-dimensional, U is a subspace of V, and S € £(U,V). 
Prove that there exists an invertible linear map T from V to itself such that 
Tu = Su for every u € Uif and only if S is injective. 


Suppose that W is finite-dimensional and S,T € L(V,W). Prove that 
null S = null T if and only if there exists an invertible E € 2(W) such that 
S = ET. 


Suppose that V is finite-dimensional and $,T € L(V,W). Prove that 
range S = range T if and only if there exists an invertible E € £(V) such 
that S = TE. 


Suppose V and W are finite-dimensional and S,T € £(V,W). Prove that 
there exist invertible E,; € £(V) and E, € L(W) such that S = E,TE, if 
and only if dim null S = dim null T. 


Suppose V is finite-dimensional and T: V > Wis a surjective linear map 
of V onto W. Prove that there is a subspace U of V such that T|,,; is an 
isomorphism of U onto W. 


Here T\,,; means the function T restricted to U. Thus T\y is the function 
whose domain is U, with T|,, defined by T\,;(u) = Tu for every u € U. 


Suppose V and W are finite-dimensional and U is a subspace of V. Let 
E€={TEL(V,W): U CnullT}. 


(a) Show that € is a subspace of L(V, W). 
(b) Find a formula for dim € in terms of dim V, dim W, and dim U. 


Hint: Define ®: L(V,W) > L(U,W) by ®(T) = Tly. What is null ®? 
What is range ®? 


Suppose V is finite-dimensional and $,T € £(V). Prove that 
ST is invertible <—  S and T are invertible. 


Suppose V is finite-dimensional and S,T, U € Z(V) and STU = I. Show 
that T is invertible and that T~! = US. 


Show that the result in Exercise 12 can fail without the hypothesis that V is 
finite-dimensional. 


Prove or give a counterexample: If V is a finite-dimensional vector space 
and R,S,T € £(V) are such that RST is surjective, then S is injective. 


Suppose T € £(V) and 7v,..., v,,, is alist in V such that Tv,, ..., Tv,,, spans V. 
Prove that v,,...,V,, spans V. 


Prove that every linear map from F”! to F*! is given by a matrix multipli- 
cation. In other words, prove that if T € 2(F"+ F”'), then there exists an 
m-by-n matrix A such that Tx = Ax for every x € F”’4 


17 
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Suppose V is finite-dimensional and S € £(V). Define 4 € £(L(V)) by 
A(T) = ST 
forT € L(V). 


(a) Show that dim null 4 = (dim V)(dim null S). 
(b) Show that dimrange 4 = (dim V)(dimrange S). 


Show that V and £(F, V) are isomorphic vector spaces. 


Suppose V is finite-dimensional and T € £(V). Prove that T has the same 
matrix with respect to every basis of V if and only if T is a scalar multiple 
of the identity operator. 


Suppose q € P(R). Prove that there exists a polynomial p € P(R) such 
that 

q(x) = (x? +x)p"(x) + 2xp'(x) + p(3) 
for allx € R. 


Suppose 1 is a positive integer and A, , € F for all j,k = 1,...,n. Prove that 

the following are equivalent (note that in both parts below, the number of 

equations equals the number of variables). 

(a) The trivial solution x; = --- = x, = 0 is the only solution to the 
homogeneous system of equations 


n 
ye Ax KX = 0 
k=1 


n 
Ann Xk = 0. 
k=1 


(b) For every c,,...,c,, € F, there exists a solution to the system of equations 


n 
y Ay KX = Cy 
k=1 


| 
i) 


n 
De AnkXk = 
k=1 


Suppose T € £(V) and 7,...,v,, is a basis of V. Prove that 
M(T, (v1,....0,)) is invertible <> T is invertible. 


Suppose that v4, ...,u,, and v,,...,v,, are bases of V. Let T € L(V) be such 
that Tv, = u;, for each k = 1,...,n. Prove that 


M(T, (04, 0050n)) = ML, (Uy, 05 Uy), (Vy 5 25 Un): 


Suppose A and B are square matrices of the same size and AB = I. Prove 
that BA = I. 
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Products of Vector Spaces 


As usual when dealing with more than one vector space, all vector spaces in use 
should be over the same field. 


3.87 definition: product of vector spaces 


Suppose V,,..., V,,, are vector spaces over F. 


e The product V, x --- x V,,, is defined by 
Vinx x Ven (Opera) Oe Vaseees Oe Verte 
e Addition on V, x --- x V,,, is defined by 


(Ua eres Uy (Oeece Oe —"(U Oa accra hen tO) 


e Scalar multiplication on V, x --- x V,,, is defined by 


IMCD co Oy) (MO cay Oe 


3.88 example: product of the vector spaces Ps(R) and R® 


Elements of ?;(R) x R° are lists of length two, with the first item in the list 
an element of ?;(R) and the second item in the list an element of R°. 

For example, (5 — 6x + 4x2, (3,8, 7)) and (x + 9x°, (2,2,2)) are elements of 
P;(R) x R* Their sum is defined by 


(5 — 6x + 4x7, (3,8,7)) + (x + 9x°, (2,2,2)) 
= (5 — 5x + 4x2 + 9x5, (5, 10,9)). 
Also, 2(5 — 6x + 4x”, (3,8,7)) = (10 — 12x + 8x?, (6, 16, 14)). 


The next result should be interpreted to mean that the product of vector spaces 
is a vector space with the operations of addition and scalar multiplication as 
defined by 3.87. 


3.89 product of vector spaces is a vector space 


Suppose V,,..., V,,, are vector spaces over F. Then V, x --- x V,,, is a vector 
space over F. 


The proof of the result above is left to the reader. Note that the additive identity 
of V, x---x V,, is (0, ...,0), where the 0 in the k" slot is the additive identity of Vi. 
The additive inverse of (v1, ...,0,,) € Vi xX +++ x Vi, iS (—0y, «4, —O,,)- 
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3.90 example: R* x R® # R° but R* x R? is isomorphic to R° 


Elements of the vector space R* x R? are lists 


(15%), (%3,%4,%5)), 


where X1,%»,%3,%4,X%5 € R. Elements of R? are lists 


(X14, %9,%X3,X4,X5), 


where X1,%,%3,%4,%5 € R. 

Although elements of R? x R° and R® look similar, they are not the same kind 
of object. Elements of R? x R° are lists of length two (with the first item itself a 
list of length two and the second item a list of length three), and elements of R° 
are lists of length five. Thus R* x R° does not equal R® 


The linear map This isomorphism is so natural that 


we should think of it as a relabel- 
ing. Some people informally say that 
is an isomorphism of the vector space RR equals R®, which is not techni- 
R2 x R° onto the vector space R® Thus  ¢@lly correct but which captures the 
these two vector spaces are isomorphic, al- SP!" of identification via relabeling. 
though they are not equal. 


((x4,%), (X3, X4,%5)) i (X41, Xp, X3, X4, X5) 


The next example illustrates the idea that we will use in the proof of 3.92. 


3.91 example: a basis of P>(R) x R? 
Consider this list of length five of elements of ?,(R) x R?: 
(1, (0, 0)), (x, (0, 0)), Ge, (0, 0)), (0, (1, 0)), (0, (0, 1)). 


The list above is linearly independent and it spans ?,(R) x R®. Thus it is a basis 
of P>(R) x R2. 


3.92 dimension of a product is the sum of dimensions 


Suppose Vj, ..., V,,, are finite-dimensional vector spaces. Then V, x --- x V,,, is 


finite-dimensional and 


dim(V, x --- x V,,) = dim V, + --- + dim V,,. 


Proof Choose a basis of each V,. For each basis vector of each V;,, consider the 
element of V, x --- x V,,, that equals the basis vector in the k"" slot and 0 in the other 
slots. The list of all such vectors is linearly independent and spans V, x --- x V,,,. 
Thus it is a basis of V, x --- x V,,,. The length of this basis is dim V, + --- + dim V,,,, 
as desired. 
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In the next result, the map I is surjective by the definition of V, + ---+ V,,,. Thus 


” 


the last word in the result below could be changed from “injective” to “invertible”. 


3.93 products and direct sums 


Suppose that V,,...,V,, are subspaces of V. Define a linear map 
T:V,x--xV,, oV, +--+ V,, by 


IOs roy) = O, 4b O00 te a) 


Then V, + --- + V,,, is a direct sum if and only if I is injective. 


Proof By 3.15, T is injective if and only if the only way to write 0 as a sum 
V1 +++ + ,,, where each v, is in V;, is by taking each v, equal to 0. Thus 1.45 
shows that I is injective if and only if V, + --- + V,,, is a direct sum, as desired. 


3.94 asum is a direct sum if and only if dimensions add up 


Suppose V is finite-dimensional and Vj,..., V,, are subspaces of V. Then 


V, +++ + V,, is a direct sum if and only if 


dim(V, + --- + V,,) = dim V, + --- + dim V,,,. 


Proof The map [ in 3.93 is surjective. Thus by the fundamental theorem of 
linear maps (3.21), Tis injective if and only if 


dim(V, + --- + V,,) = dim(V, x --- x V,,,). 


Combining 3.93 and 3.92 now shows that V, + --- + V,,, is a direct sum if and only 
if 
dim(V, + --- + V,,) = dim V, + --- + dim V,,,, 


as desired. 


In the special case m = 2, an alternative proof that V, + V, is a direct sum if 
and only if dim(V, + V,) = dim V, + dim V, can be obtained by combining 1.46 
and 2.43. 


Quotient Spaces 


We begin our approach to quotient spaces by defining the sum of a vector and a 
subset. 


3.95 notation: v + U 


Suppose v € V and U C V. Then v + U is the subset of V defined by 


v+U={v+u:ue U}. 
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3.96 example: sum of a vector and a one-dimensional subspace of R?* 


Suppose 56 (10,20) (17,20) 


U = {(x,2x) € R*: x ER}. 


Hence U is the line in R? through the origin with 
slope 2. Thus 


Uu (17,20) + U 
(17,20) +U 
is the line in R? that contains the point (17, 20) ; 
and has slope 2. 10 17 
Because (17, 20) + U is parallel 


(10,20) EU and (17,20) € (17,20) +U, ste subspace Tl. 


we see that (17,20) + U is obtained by moving U 
to the right by 7 units. 


For v € V and Ua subset of V, the set v + U is said to be a translate of U. 


3.98 example: translates 


e If Uis the line in R? defined by U = {(x, 2x) € R* : x € R}, then all lines in 
R? with slope 2 are translates of U. See Example 3.96 above for a drawing of 
U and one of its translates. 


e More generally, if U is a line in R%, then the set of all translates of U is the set 
of all lines in R? that are parallel to U. 


e IfU = {(x,y,0) € R°: x,y € R}, then the translates of U are the planes in 
R° that are parallel to the xy-plane U. 


e More generally, if U is a plane in R° then the set of all translates of U is the 
set of all planes in R® that are parallel to U (see, for example, Exercise 7). 


Suppose U is a subspace of V. Then the quotient space V/U is the set of all 


translates of U. Thus 


ViU={v+U:veEV}. 
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3.100 example: quotient spaces 


e If U = {(x,2x) € R* : x € R}, then R7/U is the set of all lines in R? that have 
slope 2. 


e If Uisa line in R® containing the origin, then R°/U is the set of all lines in R? 
parallel to U. 


e If Uisa plane in R® containing the origin, then R°/U is the set of all planes in 
R° parallel to U. 


Our next goal is to make V/U into a vector space. To do this, we will need the 
next result. 


3.101 two translates of a subspace are equal or disjoint 


Suppose U is a subspace of V and v,w € V. Then 


v—-weul = v+U=wt+Uu @& (vt+U)nN(wi+l) s9. 


Proof First suppose v — w € U. Ifu € U, then 
v+u=wt+((v—w)+u) ewr+l. 


Thus v+ U C w+ U. Similarly, w+ U C v+ U. Thus v+U = w+ U, completing 
the proof that v — w € Uimpliesv +U =w+U. 

The equation v + U = w + U implies that (0 + UW) N (w+ U) + gd. 

Now suppose (v + U) n (w+ U) # @. Thus there exist u,,u € U such that 


Thus v — w = uy — u,. Hence v — w € U, showing that (v + U) N (w+ U) # 9B 
implies v — w € U, which completes the proof. 


Now we can define addition and scalar multiplication on V/U. 
' 3.102 definition: addition and scalar multiplication on V/U 


Suppose U is a subspace of V. Then addition and scalar multiplication are 
defined on V/U by 


(v+U)+(w+U) = (v+w)+U 
A(v + U) = (Av) + U 


for all v, w € V and all A € F. 


As part of the proof of the next result, we will show that the definitions above 
make sense. 
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3.103 quotient space is a vector space 


Suppose U is a subspace of V. Then V/U, with the operations of addition and 
scalar multiplication as defined above, is a vector space. 


Proof The potential problem with the definitions above of addition and scalar 
multiplication on V/U is that the representation of a translate of U is not unique. 
Specifically, suppose v,, V5, 1, € V are such that 


v,+U=0,+U and w,+U=w,+ U. 


To show that the definition of addition on V/U given above makes sense, we must 
show that (v, + w,) + U = (v2 + Wz) + 'U. 
By 3.101, we have 


0, -v,EU and w,-w, EU. 


Because U is a subspace of V and thus is closed under addition, this implies that 
(01 — Up) + (Wy — Wy) € U. Thus (v1 + w,) — (09 + Wa) € U. Using 3.101 again, 
we see that 

(0, +w,)+U = (09 + Wo) + :U, 


as desired. Thus the definition of addition on V/U makes sense. 

Similarly, suppose A € F. We are still assuming that v; + U = v, + U. 
Because U is a subspace of V and thus is closed under scalar multiplication, we 
have A(v, — v2) € U. Thus Av, — Avy € U. Hence 3.101 implies that 


Thus the definition of scalar multiplication on V/U makes sense. 

Now that addition and scalar multiplication have been defined on V/U, the 
verification that these operations make V/U into a vector space is straightforward 
and is left to the reader. Note that the additive identity of V/U is 0 + U (which 
equals U) and that the additive inverse of v + Wis (—v) + U. 


The next concept will lead to a computation of the dimension of V/U. 
3.104 definition: quotient map, 7 


Suppose U is a subspace of V. The quotient map 7: V > V/U is the linear 


map defined by 


m(v) =v+U 


for each v € V. 


The reader should verify that 7t is indeed a linear map. Although zt depends 
on U as well as V, these spaces are left out of the notation because they should be 
clear from the context. 
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3.105 dimension of quotient space 


Suppose V is finite-dimensional and U is a subspace of V. Then 


dim V/U = dim V — dim U. 


Proof Let 7rdenote the quotient map from V to V/U. Ifv € V, thenv+U = 0+U 
if and only if v € U (by 3.101), which implies that null zz = U. The definition of 
7 implies range 77 = V/U. The fundamental theorem of linear maps (3.21) now 
implies dim V = dim U + dim V/U, which gives the desired result. 


Each linear map T on V induces a linear map Ton V/(null T), which we now 
define. 


3.106 notation: ie 


Suppose T € L(V,W). Define T: V/(nullT) > W by 
Tv + nullT) = Tv. 


To show that the definition of T makes sense, suppose u,v € V are such that 
u+nulT =v + nullT. By 3.101, we have u —v € nullT. Thus T(u — v) = 0. 
Hence Tu = Tv. Thus the definition of T indeed makes sense. The routine 
verification that T is a linear map from V/(null T) to W is left to the reader. 

The next result shows that we can think of T as a modified version of T, with 
a domain that produces a one-to-one map. 


3.107 null space and range of T 


Suppose T € £(V,W). Then 
(a) Ton= T, where 7x is the quotient map of V onto V/ (null T); 


(b) T is injective; 


(c) range T= range T; 


(d) V/(null T) and range T are isomorphic vector spaces. 


Proof 
(a) Ifv € V, then (To z)(v) = T(a(v)) = T(v + nullT) = Tv, as desired. 


(b) Suppose v € V and TW +nullT) = 0. Then Tv = 0. Thus v € null T. 
Hence 3.101 implies that v + null T = 0 + null T. This implies that null T = 
{0 + null T}. Hence T is injective, as desired. 

(c) The definition of T shows that range T = range T. 


(d) Now (b) and (c) imply that if we think of Tas mapping into range T, then si 
is an isomorphism from V/ (null T) onto range T. 
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Exercises 3E 


1 


10 


Suppose T is a function from V to W. The graph of T is the subset of Vx W 
defined by 
graph of T = {(v, Tv) E Vx W: ve V}. 


Prove that T is a linear map if and only if the graph of T is a subspace of 

Vx W. 
Formally, a function T from V to W is a subset T of Vx W such that for 
each v €& V, there exists exactly one element (v,w) € T. In other words, 
formally a function is what is called above its graph. We do not usually 
think of functions in this formal manner. However, if we do become formal, 
then this exercise could be rephrased as follows: Prove that a function T 
from V to W is a linear map if and only if T is a subspace of Vx W. 


Suppose that Vj,..., V,,, are vector spaces such that V, x --- x V,,, is finite- 
dimensional. Prove that V, is finite-dimensional for each k = 1,..., m. 


Suppose V,,..., V,,, are vector spaces. Prove that L(V, x --- x V,,,,W) and 
£(V,,W) x --- x Z(V,,, W) are isomorphic vector spaces. 


Suppose W,,..., W,,, are vector spaces. Prove that L(V, W, x --- x W,,) and 
£(V,W,) x --- x Z(V, W,,) are isomorphic vector spaces. 


For m a positive integer, define V” by 
VU =Vx- x V. 
——" 
m times 


Prove that V” and 2(F”, V) are isomorphic vector spaces. 


Suppose that v, x are vectors in V and that U, W are subspaces of V such 
that v + U = x + W. Prove that U = W. 


Let U = {(x,y,z) € R®° : 2x + 3y + 5z = 0}. Suppose A C R® Prove that 
A is a translate of U if and only if there exists c € R such that 


A= {(x,y,z) € R® : 2x + 3y + 5z =c}. 


(a) Suppose T € Z(V,W) andc € W. Prove that {x € V: Tx = c} is 
either the empty set or is a translate of null T. 

(b) Explain why the set of solutions to a system of linear equations such as 
3.27 is either the empty set or is a translate of some subspace of F”. 


Prove that a nonempty subset A of V is a translate of some subspace of V if 
and only if Av + (1 -—A)w € A forall v,w € A andall A € F. 


Suppose A, = v + U, and A, = w+ U, for some v,w € V and some 
subspaces U,,U, of V. Prove that the intersection A, M Ap, is either a 
translate of some subspace of V or is the empty set. 
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Suppose U = {(x1,%,...) € F° : x, # 0 for only finitely many k}. 


(a) Show that U is a subspace of F®. 
(b) Prove that F°/U is infinite-dimensional. 


Suppose 2, ...,U,, € V. Let 
A = {Aq0, to HAV 2 Ads Am © Fand A, +--+ 4+A,, = 1. 


(a) Prove that A is a translate of some subspace of V. 


(b) Prove that if B is a translate of some subspace of V and {7},...,0,,} C B, 
then A C B. 


(c) Prove that A is a translate of some subspace of V of dimension less 
than m. 


Suppose U is a subspace of V such that V/U is finite-dimensional. Prove 
that V is isomorphic to U x (V/U). 


Suppose U and W are subspaces of V and V = U @ W. Suppose wy, ..., W,, 
is a basis of W. Prove that w, + U,...,w,, + U isa basis of V/U. 


Suppose U is a subspace of V and v, + U,...,v,, + Uis a basis of V/U and 
Uy, ...,U,, is a basis of U. Prove that v1,...,0,,, U1, ...,U, is a basis of V. 


Suppose g € £(V,F) and o ¢ 0. Prove that dim V/(null g) = 1. 


Suppose U is a subspace of V such that dim V/U = 1. Prove that there exists 
gy © £(V,F) such that null g = U. 


Suppose that U is a subspace of V such that V/U is finite-dimensional. 


(a) Show that if W is a finite-dimensional subspace of V and V = U+ W, 
then dim W > dim V/U. 

(b) Prove that there exists a finite-dimensional subspace W of V such that 
dim W = dim V/U and V = U@ W. 


Suppose T € £(V, W) and U is a subspace of V. Let zt denote the quotient 
map from V onto V/U. Prove that there exists S € £(V/U, W) such that 
T = So vif and only if U C nullT. 
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3F Duality 


Dual Space and Dual Map 


Linear maps into the scalar field F play a special role in linear algebra, and thus 
they get a special name. 


A linear functional on V is a linear map from V to F. In other words, a linear 
functional is an element of Z(V, F). 


3.109 example: linear functionals 


Define g: R* = R by g(x, y,z) = 4x — 5y + 2z. Then 9g is a linear functional 
on R* 


Fix (c1,...,¢,) € F” Define g: F” > F by (x4,...,X,) = CyXy to $C, Xp. 
Then 9g is a linear functional on F”. 


Define g: P(R) > R by 


pp) = 3p"(5) + 7p(A). 
Then 9 is a linear functional on P(R). 
Define g: P(R) > R by 


1 
pp) =| Pp 
for each p € P(R). Then ¢g is a linear functional on P(R). 
The vector space £(V, F) also gets a special name and special notation. 
The dual space of V, denoted by V’, is the vector space of all linear functionals 


on V. In other words, V’ = Z(V,F). 


3.111 dim V’ = dimV 


Suppose V is finite-dimensional. Then V’ is also finite-dimensional and 


dim V’ = dim V. 


Proof By 3.72 we have 


dim V’ = dim £(V, F) = (dim V) (dim F) = dim V, 


as desired. 
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In the following definition, the linear map lemma (3.4) implies that each @; is 
well defined. 


3.112 definition: dual basis 

If v1,...,0,, is a basis of V, then the dual basis of vj,...,V,, is the list 9}, ..., 9, 
of elements of V’, where each @; is the linear functional on V such that 

1 ifk=j, 

O ifk#j. 


Pj (Ux) = | 


3.113 example: the dual basis of the standard basis of F" 


Suppose 7 is a positive integer. For 1 < j < n, define g; to be the linear 
functional on F” that selects the j" coordinate of a vector in F”. Thus 


Pj (Xq5 +205 Xp) = 4%; 


for each (X1,...,X,) € F” 
Let e;,...,e,, be the standard basis of F”. Then 


ey afl tks 
(e.) = 
Pie Vg tk ej. 


Thus 9, ..., @,, is the dual basis of the standard basis e,,...,e,, of F”. 


The next result shows that the dual basis of a basis of V consists of the linear 
functionals on V that give the coefficients for expressing a vector in V as a linear 
combination of the basis vectors. 


3.114 dual basis gives coefficients for linear combination 


Suppose v,,...,V,, is a basis of V and @j,..., g,, is the dual basis. Then 


VU = PV), + ++ + Py (V)Vy, 


for each v € V. 


Proof Suppose v € V. Then there exist c,, ...,c,, € F such that 

3.115 VU = C10, +0 + CyADy- 

Ifj € {1,...,7}, then applying ¢; to both sides of the equation above gives 
p;(V) = Cj. 


Substituting the values for c,,...,c,, given by the equation above into 3.115 shows 
that v = 1 (V)01 +5 + @_(V)Vq. 
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The next result shows that the dual basis is indeed a basis of the dual space. 
Thus the terminology “dual basis” is justified. 


3.116 dual basis is a basis of the dual space 


Suppose V is finite-dimensional. Then the dual basis of a basis of V is a basis 
of V". 


Proof Suppose 7,...,v,, is a basis of V. Let 9, ..., p,, denote the dual basis. 
To show that 9, ..., y,, is a linearly independent list of elements of V’, suppose 
fy, ++.,4, © F are such that 


3.117 A449, +++ +a,P, = 0. 


Now 
(4171 ap eRe An Pn) (Ox) = A 


for each k = 1,...,. Thus 3.117 shows that a, = --- =a, = 0. Hence ,..., 9, 
is linearly independent. 

Because 9},...,~,, is a linearly independent list in V’ whose length equals 
dim V’ (by 3.111), we can conclude that 9, ..., g,, is a basis of V’ (see 2.38). 


In the definition below, note that if T is a linear map from V to W then T” is a 
linear map from W’ to V’, 


3.118 definition: dual map, T’ 


Suppose T € £(V,W). The dual map of T is the linear map T’ € £(W’‘, V’) 


defined for each g € W'by 


T(9) =@ oT. 


IfT € £(V,W) and g € W’, then T’(g) is defined above to be the composition 
of the linear maps g and T. Thus Tg) is indeed a linear map from V to F; in 
other words, Tg) € V’. 

The following two bullet points show that T’ is a linear map from W’ to V’, 


e Ifg,p € W’, then 
Tp +) = (9+) oT =peoT+ HoT =T(y) +Ty). 
e IfA © Fandg € W’ then 
T(Ag) = (Ag) oT = A(poT) = AT(@). 


The prime notation appears with two unrelated meanings in the next example: 
D’ denotes the dual of the linear map D, and p’ denotes the derivative of a 
polynomial p. 
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3.119 example: dual map of the differentiation linear map 


Define D: P(R) > P(R) by Dp = p’. 


e Suppose ¢ is the linear functional on P(R) defined by g(p) = p(3). Then 
D‘(@) is the linear functional on P(R) given by 


(Dp) (P) = (p 2 D)(p) = pp) = 9(P’) = P'S). 
Thus D(@) is the linear functional on ?(R) taking p to p‘(3). 


e Suppose @¢ is the linear functional on ?(R) defined by g(p) = fo p. Then 
D‘(@) is the linear functional on P(R) given by 


(D(9))(p) = (p ° D)(p) 


= p(1) — p(0). 
Thus D‘(@) is the linear functional on ?(R) taking p to p(1) — p(0). 
In the next result, (a) and (b) imply that the function that takes T to T’ is a 


linear map from £(V,W) to £(W/,V’). 
In (c) below, note the reversal of order from ST on the left to T’S’ on the right. 


3.120 algebraic properties of dual maps 


Suppose T € £(V,W). Then 

(a) (S+T)' =S'+T"’ forall S € L(V, W); 
(b) (AT)' = AT’ for all A € F; 

(c) (ST)' = T’S’ for all S € L(W,U). 


Proof The proofs of (a) and (b) are left to the reader. 
To prove (c), suppose g € U’. Then 


(ST)'(g) = g° (ST) = (9°S) oT =T(po0S) =T'(S(@)) = (T'S) (9), 


where the first, third, and fourth equal- ee ee ea Lee 

ities above hold because of the defini- for duality instead of V’ and T’ 

tion of the dual map, the second equality However, here we reserve the notation 

holds because composition of functions —_[* for the adjoint, which will be intro- 

is associative, and the last equality fol- — @yced when we study linear maps on 

lows from the definition of composition. inner product spaces in Chapter 7. 
The equation above shows. that 

(ST)(g) = (T'S')(@) for all gp € U’ 

Thus (ST)' = T'S’. 
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Null Space and Range of Dual of Linear Map 


Our goal in this subsection is to describe null T’ and range T’ in terms of range T 
and null T. To do this, we will need the next definition. 


3.121 definition: annihilator, U° 


For U C V, the annihilator of U, denoted by U®, is defined by 


U° = {g EV’: g(u) = O forall u € U}. 


3.122 example: element of an annihilator 


Suppose U is the subspace of ?(R) consisting of polynomial multiples of x. 
If g is the linear functional on P(R) defined by g(p) = p'(0), then g € UP. 


For U C V, the annihilator U® is a subset of the dual space V’. Thus u°® 
depends on the vector space containing U, so a notation such as U?, would be 
more precise. However, the containing vector space will always be clear from the 
context, so we will use the simpler notation U®. 


3.123 example: the annihilator of a two-dimensional subspace of R° 


Let €1,€p,€3,€4,s5 denote the standard basis of R°; let 91, P2,93, 94,95 © 
(R°) denote the dual basis of 1, ey, €5, e4, €;. Suppose 


U = span(e,,e2) = {(X1,X2,0,0,0) € R°: x,,x, € R}. 


We want to show that U° = span(@3, 4, Ps). 
Recall (see 3.113) that g; is the linear functional on R° that selects the j* 
coordinate: Pj (X15 X25 XZ, X4,X5) =X). 
First suppose @ € span(@3, ~4, ~5). Then there exist c3,c4,c; € R such that 


Y = C33 + CyQ4 + C5Ms5. If (x1, X,0,0,0) € U, then 
P(X1,Xz,0,0,0) = (C33 + CyPg + C5>P5)(X1,Xp,0,0,0) = 0. 


Thus yg € U®. Hence we have shown that span(p3, ~4, 5) C U®. 

To show the inclusion in the other direction, suppose that p € U®. Be- 
cause the dual basis is a basis of (R°); there exist c1,C,C3,C4,C5 € R such that 
P = C11 + CoPy + C33 + Cay + C5M5. Because e, € U and g € UP, we have 


0 = pley) = (C1 P1 + CoP2 + C33 + CaP4 + C55)(C1) = Cy. 


Similarly, ey € U and thus cy = 0. Hence p = ¢393 + Cay + C5~5. Thus 
y € span(@3, P4, ~5), which shows that Ur C span(@3, P4, Ps). 
Thus U° = span(9@3, ~4, Ps)- 
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3.124 the annihilator is a subspace 


Suppose U C V. Then U® is a subspace of V’. 


Proof Note that 0 € U® (here 0 is the zero linear functional on V) because the 
zero linear functional applied to every vector in U is the zero vector in V. 
Suppose 9g, » € U®. Thus g, p € V’ and g(u) = ~(u) = 0 for every u € U. 
If u € U, then 
(p+ p)(u) = pu) + pu) =04+0=0. 


Thus 9 + p € U® 
Similarly, U° is closed under scalar multiplication. Thus 1.34 implies that U° 
is a subspace of V“. 


The next result shows that dim U® is the difference of dim V and dim U. For 
example, this shows that if U is a two-dimensional subspace of R°, then U° is a 
three-dimensional subspace of (R°)’, as in Example 3.123. 

The next result can be proved following the pattern of Example 3.123: choose 
a basis u4,..., u,,, of U, extend to a basis 114, ..., Uj), -+-5 U,, OF V, let Py, 5 Prys es Pn 
be the dual basis of V’, and then show that 9,,,.1,---. P, is a basis of U°, which 
implies the desired result. You should construct the proof just outlined, even 


though a slicker proof is presented here. 


3.125 dimension of the annihilator 


Suppose V is finite-dimensional and U is a subspace of V. Then 


dim U° = dim V — dim U. 


Proof Leti € £(U, V) be the inclusion map defined by i(u) = u foreach u € U. 
Thus 7’ is a linear map from V’ to U’. The fundamental theorem of linear maps 
(3.21) applied to i’ shows that 


dim range i’ + dimnulli’ = dim V’. 


However, null i’ = U°® (as can be seen by thinking about the definitions) and 
dim V’ = dim V (by 3.111), so we can rewrite the equation above as 


3.126 dim range i’ + dim U° = dim V. 


If g € U’, then ¢ can be extended to a linear functional » on V (see, for 
example, Exercise 13 in Section 3A). The definition of i’ shows that i(w) = 9. 
Thus 9 € range i’, which implies that range i’ = U’. Hence 


dim range i’ = dim U’ = dim U, 


and then 3.126 becomes the equation dim U + dim U° = dim V, as desired. 
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The next result can be a useful tool to show that a subspace is as big as 
possible—see (a)—or to show that a subspace is as small as possible—see (b). 


3.127 condition for the annihilator to equal {0} or the whole space 


Suppose V is finite-dimensional and U is a subspace of V. Then 


(a) U° = {0} = U=V; 
(Cy SV ae 0). 


Proof To prove (a), we have 
U° = {0} <=» dimU° = 0 
= dimU = dimV 
= U=YV, 


where the second equivalence follows from 3.125 and the third equivalence follows 
from 2.39. 
Similarly, to prove (b) we have 


U° = V’ = dimU?® = dimV’ 
<= dimU®° = dimV 
«=> dimU =0 
=> U = {0}, 


where one direction of the first equivalence follows from 2.39, the second equiva- 
lence follows from 3.111, and the third equivalence follows from 3.125. 


The proof of (a) in the next result does not use the hypothesis that V and W 
are finite-dimensional. 


3.128 the null space of T’ 


Suppose V and W are finite-dimensional and T € £(V, W). Then 


(a) nullT’ = (range T)°; 
(b) dimnull T’ = dimnull T + dim W — dim V. 


Proof 
(a) First suppose g € null T’. Thus 0 = Ty) = go T. Hence 


0=(goT)(v) =@(Tv) foreveryv € V. 
Thus g € (range T)°. This implies that null T’ C (range T)°. 


To prove the inclusion in the opposite direction, now suppose g € (range T)°. 
Thus y(Tv) = 0 for every vector v € V. Hence 0 = yo T = Tg). In other 
words, g € null T’, which shows that (range T)° C nullT’, completing the 
proof of (a). 
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(b) We have 
dim null T’ = dim(range T)° 
= dim W — dimrange T 
= dim W — (dim V — dim null T) 
= dim null T + dim W — dim V, 


where the first equality comes from (a), the second equality comes from 
3.125, and the third equality comes from the fundamental theorem of linear 
maps (3.21). 


The next result can be useful because sometimes it is easier to verify that T’ 
is injective than to show directly that T is surjective. 


3.129 T surjective is equivalent to T' injective 


Suppose V and W are finite-dimensional and T € £(V, W). Then 


T is surjective <=» T" is injective. 


Proof We have 
T € £(V,W) is surjective <=» range T = W 
<= (rangeT)° = {0} 
<= nullT’ = {0} 
<=> T” is injective, 
where the second equivalence comes from 3.127(a) and the third equivalence 
comes from 3.128(a). 


3.130 the range of T’ 


Suppose V and W are finite-dimensional and T € £(V, W). Then 


(a) dimrange T’ = dimrange T; 


(b) range T’ = (null T)°. 


Proof 
(a) We have 


dim range T’ = dim W’ — dimnull T’ 
= dim W — dim(range T)° 
= dimrange T, 


where the first equality comes from 3.21, the second equality comes from 
3.111 and 3.128(a), and the third equality comes from 3.125. 
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(b) First suppose g € range T’. Thus there exists y € W’ such that g = T(y). 
If v € null T, then 


g(v) = (TY) )0 = (ho T)(v) = w(Tv) = (0) = 0. 
Hence g € (null T)°. This implies that range T’ C (null T)°. 


We will complete the proof by showing that range T’ and (null T)° have the 
same dimension. To do this, note that 


dim range T’ = dim range T 
= dim V — dimnull T 
= dim(null T)°, 


where the first equality comes from (a), the second equality comes from 3.21, 
and the third equality comes from 3.125. 


The next result should be compared to 3.129. 


3.131 T injective is equivalent to T’ surjective 


Suppose V and W are finite-dimensional and T € £(V, W). Then 


T is injective <=» T" is surjective. 


Proof We have 
Tis injective <=» nullT = {0} 
= (nullT)® =V’ 
= rangeT’ = V’, 


where the second equivalence follows from 3.127(b) and the third equivalence 
follows from 3.130(b). 


Matrix of Dual of Linear Map 


The setting for the next result is the assumption that we have a basis 7, ..., v,, of 
V, along with its dual basis @,,...,~,, of V.. We also have a basis w,, ...,w,, of W, 
along with its dual basis 7, ..., y,,, of W’.. Thus M(T) is computed with respect 
to the bases just mentioned of V and W, and (T’) is computed with respect to 
the dual bases just mentioned of W’ and V’. Using these bases gives the following 
pretty result. 


3.132 matrix of T’ is transpose of matrix of T 


Suppose V and W are finite-dimensional and T € £(V, W). Then 


NOE N= 00D) 
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Proof LetA = M(T) andC = M(T’). Suppose1<j<mand1l<k<n. 
From the definition of 1(T’) we have 
Tp) = YC, jr 
r=1 


The left side of the equation above equals ¥; o T. Thus applying both sides of the 
equation above to v;, gives 


(Hj 0 T) (Oe) = DC, Pr() 


r= 1 
= Ch j- 
We also have 


(pj 0 T) (2) = Yj(T0,) 
— w. . A, e 
70% i 
=) Ah (w,) 
r=1 


= jk 
Comparing the last line of the last two sets of equations, we have C,; = Aj,x. 
Thus C = A‘ In other words, (T’) = (M(T))', as desired. 


Now we use duality to give an alternative proof that the column rank of a 
matrix equals the row rank of the matrix. This result was previously proved using 
different tools—see 3.57. 


3.133 column rank equals row rank 


Suppose A € F””. Then the column rank of A equals the row rank of A. 


Proof Define T: F”! > F”! by Tx = Ax. Thus M(T) = A, where M(T) is 
computed with respect to the standard bases of F":! and F”:!. Now 
column rank of A = column rank of M(T) 
= dim range T 
= dim range T’ 
= column rank of M(T’) 
= column rank of A‘ 
= row rank of A, 
where the second equality comes from 3.78, the third equality comes from 3.130(a), 


the fourth equality comes from 3.78, the fifth equality comes from 3.132, and the 
last equality follows from the definitions of row and column rank. 


See Exercise 8 in Section 7A for another alternative proof of the result above. 
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Exercises 3F 


10 


11 


Explain why each linear functional is surjective or is the zero map. 
Give three distinct examples of linear functionals on R!°-1! 


Suppose V is finite-dimensional and v € V with v # 0. Prove that there 
exists p € V’ such that g(v) = 1. 


Suppose V is finite-dimensional and U is a subspace of V such that U # V. 
Prove that there exists g € V’ such that y(u) = 0 for every u € U but 9 + 0. 


Suppose T € £(V, W) and w,,..., w,, is a basis of range T. Hence for each 
v & V, there exist unique numbers @ (v), ..., @,,(v) such that 


TO = Qy(0)Wy +++ Dy (V) Wy, 


thus defining functions 9, ..., @,, from V to F. Show that each of the func- 
tions @1,..., 9, is a linear functional on V. 


Suppose g, 6 € V’. Prove that nullg C null if and only if there exists 
c € F such that 6 = cq. 


Suppose that V,,..., V,,, are vector spaces. Prove that (V, x --- x V,,,)’ and 
V,' x +x V,,/ are isomorphic vector spaces. 


Suppose 7,, ..., ,, is a basis of V and ¢, ..., g,, is the dual basis of V’. Define 
T: V > F" and A: F" > V by 


T(v) = (91(¥),--5.P,(0)) and A(a,,...,4,) = 4,0, ++ +4,0,. 
Explain why I and A are inverses of each other. 


Suppose m is a positive integer. Show that the dual basis of the basis 
1,x,...,x"" of P,,,(R) is G9, 91, +++; Py, Where 


0) 
px(p) =. 


Here p denotes the k"" derivative of p, with the understanding that the 0" 
derivative of p is p. 


Suppose m is a positive integer. 


(a) Show that 1,x —5,..., (x — 5)” is a basis of P,,, (R). 
(b) What is the dual basis of the basis in (a)? 


Suppose v,,...,V,, is a basis of V and @j,..., g,, is the corresponding dual 
basis of V’. Suppose y € V’. Prove that 


yp = Y(01) 91 ape sk PO) Pn- 
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Suppose S,T € L(V, W). 


(a) Prove that ($+ T)’ = S'+T"' 
(b) Prove that (AT)’ = AT’ for all A € F. 


This exercise asks you to verify (a) and (b) in 3.120. 


Show that the dual map of the identity operator on V is the identity operator 
on V’ 


Define T: R? > R? by 
T(x, y,Z) = (4x + 5y + 6z,7x + By + 92). 


Suppose 91, @> denotes the dual basis of the standard basis of R? and 
1, W2, W; denotes the dual basis of the standard basis of R* 


(a) Describe the linear functionals Ty) and T’(g3). 
(b) Write T’(p,) and T'(@z) as linear combinations of 11, Pp, 3. 


Define T: P(R) > P(R) by 
(Tp) (x) = x*p(x) + p(x) 


foreachx ER. 


(a) Suppose g € P(R)' is defined by y(p) = p(4). Describe the linear 
functional Ty) on P(R). 
(b) Suppose y € P(R)’ is defined by g(p) = i p. Evaluate (T'(p)) (x°). 


Suppose W is finite-dimensional and T € £(V,W). Prove that 
T=0 = T=0. 


Suppose V and W are finite-dimensional and T € £(V, W). Prove that T is 
invertible if and only if T’ € Z(W’, V’) is invertible. 


Suppose V and W are finite-dimensional. Prove that the map that takes 
T € £(V,W) to T’ € L(W’V’) is an isomorphism of £(V,W) onto 
L(WV’). 


Suppose U C V. Explain why 
U° = {gp EV’: U Cnullg}. 
Suppose V is finite-dimensional and U is a subspace of V. Show that 
U = {v EV: p(v) = 0 for every g € U®}. 


Suppose V is finite-dimensional and U and W are subspaces of V. 


(a) Prove that W° C U® if and only if U C W. 
(b) Prove that W° = U° if and only if U = W. 
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Suppose V is finite-dimensional and U and W are subspaces of V. 


(a) Show that (U + W)° = U°n W®. 

(b) Show that (UN W)° = U° + W®. 

Suppose V is finite-dimensional and 9, ..., ¢,, € V’. Prove that the follow- 
ing three sets are equal to each other. 

(a) span(9y,..., Py) 

(b) ((null g,) NM (null g,,))° 

(c) {pe V': (nullg,) N--- (null g,,) C null g} 


Suppose V is finite-dimensional and 7, ...,v,, € V. Define a linear map 
T: V' > FE” by TQ) = (901), «+5 P(Om))- 

(a) Prove that v,...,v,, spans V if and only if T is injective. 

(b) Prove that 7j,...,v,,, is linearly independent if and only if Tis surjective. 
Suppose V is finite-dimensional and 9,...,9¢,,, € V’. Define a linear map 
T: V > F” by (0) = (91 (0), ..., Py (V)). 

(a) Prove that ~,...,@,, spans V’ if and only if I is injective. 

(b) Prove that ~,, ..., ~,, is linearly independent if and only if T is surjective. 


Suppose V is finite-dimensional and QO, is a subspace of V’. Prove that 

QO = {v EV: gv) = 0 for every 9 € Q}°. 
Suppose T € £(P5(R)) and nullT’ = span(g), where ¢ is the linear 
functional on ?;(R) defined by g(p) = p(8). Prove that 

range T = {p € P5(R) : p(8) = 0}. 
Suppose V is finite-dimensional and 9, ..., p,,, is a linearly independent list 
in V’. Prove that 
dim((null g,) N+ A (null g,,)) = (dim V) — m. 

Suppose V and W are finite-dimensional and T € £(V,W). 
(a) Prove that if g € W’ and null T’ = span(@), then range T = null 9. 
(b) Prove that if € V’ and range T’ = span(q), then null T = null y. 


Suppose V is finite-dimensional and @,,..., g,, is a basis of V’. Show that 
there exists a basis of V whose dual basis is @,..., ,;. 


Suppose U is a subspace of V. Leti: U > V be the inclusion map defined 

by i(u) = u. Thus i’ € £(V‘,U'). 

(a) Show that null i’ = U® 

(b) Prove that if V is finite-dimensional, then range i’ = U’. 

(c) Prove that if V is finite-dimensional, then i’ is an isomorphism from 
V'/U® onto U’. 


The isomorphism in (c) is natural in that it does not depend on a choice of 
basis in either vector space. 
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The double dual space of V, denoted by V", is defined to be the dual space 
of V’. In other words, V"” = (V’)’. Define A: V > V" by 


(Av) (p) = pC) 
for each v € V and each g € V’. 


(a) Show that A is a linear map from V to V”. 

(b) Show that if T € L(V), then T” o A = Ao T, where T” = (T’)’. 

(c) Show that if V is finite-dimensional, then A is an isomorphism from V 
onto V”. 


Suppose V is finite-dimensional. Then V and V' are isomorphic, but finding 
an isomorphism from V onto V' generally requires choosing a basis of V. 
In contrast, the isomorphism A from V onto V" does not require a choice 
of basis and thus is considered more natural. 


Suppose U is a subspace of V. Let 77: V > V/U be the usual quotient map. 
Thus 7’ € L((V/U)', V’). 
(a) Show that zt’ is injective. 
(b) Show that range 7’ = U®. 
(c) Conclude that zr’ is an isomorphism from (V/U)' onto U®. 
The isomorphism in (c) is natural in that it does not depend on a choice of 


basis in either vector space. In fact, there is no assumption here that any of 
these vector spaces are finite-dimensional. 


Chapter 4 cee 
Polynomials 


This chapter contains material on polynomials that we will use to investigate 
linear maps from a vector space to itself. Many results in this chapter will already 
be familiar to you from other courses; they are included here for completeness. 

Because this chapter is not about linear algebra, your instructor may go through 
it rapidly. You may not be asked to scrutinize all the proofs. Make sure, however, 
that you at least read and understand the statements of all results in this chapter— 
they will be used in later chapters. 

This chapter begins with a brief discussion of algebraic properties of the 
complex numbers. Then we prove that a nonconstant polynomial cannot have 
more zeros than its degree. We also give a linear-algebra-based proof of the 
division algorithm for polynomials, which is worth reading even if you are already 
familiar with a proof that does not use linear algebra. 

As we will see, the fundamental theorem of algebra leads to a factorization of 
every polynomial into degree-one factors if the scalar field is C or to factors of 
degree at most two if the scalar field is R. 


Ad 99 wauener ezallly 


book written in 1070 contained the first serious study of cubic polynomials. 
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Before discussing polynomials with complex or real coefficients, we need to 
learn a bit more about the complex numbers. 


4.1 definition: real part, Rez, imaginary part, Imz 


Suppose z = a + bi, where a and b are real numbers. 


e The real part of z, denoted by Re z, is defined by Rez = a. 
e The imaginary part of z, denoted by Im z, is defined by Imz = b. 


Thus for every complex number z, we have 


z = Rez+ (Imz)i. 


4.2 definition: complex conjugate, Z, absolute value, |z| 


Suppose z € C. 
e The complex conjugate of z € C, denoted by Z, is defined by 


Z = Rez-— (Imz)i. 


e The absolute value of a complex number z, denoted by |z|, is defined by 


zl = (Rez)? + (Imz)?. 


4.3. example: real and imaginary part, complex conjugate, absolute value 


Suppose z = 3 + 2i. Then 


e Rez = 3and Imz = 2; 
ezZ=3-2i; 
e [z| = V32 +22 = 13. 


Identifying a complex number z € C with the ordered pair (Rez, Imz) € R? 
identifies C with R% Note that C is a one-dimensional complex vector space, 
but we can also think of C (identified with R?) as a two-dimensional real vector 
space. 

The absolute value of each complex number is a nonnegative number. Specif- 
ically, if z € C, then |z| equals the distance from the origin in R? to the point 
(Rez, Imz) € R2 

The real and imaginary parts, com- 
plex conjugate, and absolute value have 
the properties listed in the following 
multipart result. 


You should verify that z = Z if and only 
if z is a real number. 
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4.4 properties of complex numbers 


Suppose w,z € C. Then the following equalities and inequalities hold. 


sum of z and Z 
te = DIRE, 


difference of z and z 
zZ—-Z=2(Imz)i. 
product of z and Z 


zz = |z/?. 


additivity and multiplicativity of complex conjugate 
w+z=w+zandwz = wz. 


double complex conjugate 
2a 


real and imaginary parts are bounded by |z| 
Re z| < |z| and |Imz| < |z|. 


absolute value of the complex conjugate 
Z| = IzI. 

multiplicativity of absolute value 

wz| = |e! |zI. 


triangle inequality 
wt z| < |w| +t (zl. 


Proof Except for the last item above, Geometric interpretation of triangle in- 
the routine verifications of the assertions equality: The length of each side of a 
above are left to the reader. To verify the riangle is less than or equal to the sum 


triangle inequality, we have of the lengths of the two other sides. 


lw +z = (w+z)(W+2Z) 


= WW + 2% + wz + 20 
= |w? + |Z? + wz + wz 
= |w/* + |z? + 2Re(wz) 
< |wl* + [z\* + 2\wz| 


= |wl? + Iz? + 2\wI IzI 


= (\w| + |zI)2. 


Taking square roots now gives the desired 


: ° See Exercise 2 for the reverse triangle 
inequality |w + z| < |w| + |zI. 


inequality. 
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Zeros of Polynomials 


Recall that a function p: F > F is called a polynomial of degree m if there exist 
Ag, ++ 4, © F with a,, # 0 such that 


P(Z) = Ag + AyZ +0 + Ay,z"™ 


for all z € F. A polynomial could have more than one degree if the representation 
of p in the form above were not unique. Our first task is to show that this cannot 
happen. 

The solutions to the equation p(z) = 0 play a crucial role in the study of a 
polynomial p € P(F). Thus these solutions have a special name. 


A number A € F is called a zero (or root) of a polynomial p € P(F) if 


ag) = 0: 


The next result is the key tool that we will use to show that the degree of a 
polynomial is unique. 


4.6 each zero of a polynomial corresponds to a degree-one factor 


Suppose m is a positive integer and p € P(F) is a polynomial of degree m. 
Suppose A € F. Then p(A) = 0 if and only if there exists a polynomial 


q € P(F) of degree m — 1 such that 


14 Cer 9171€4) 


for every z € F. 


Proof First suppose p(A) = 0. Let ag, a1,...,4,, © F be such that 
p(Z) =ag t+ayzZte +42" 

for all z € F. Then 

4.7 p(Z) = p(z) — p(A) =a, (z— A) ++ +4, (2" — A”) 


for all z € F. For eachk € {1,..., m}, the equation 


k 
zk— Ak = (z-A) >. Mo-Agk-j 


j=l 


shows that z* — A‘ equals z — A times some polynomial of degree k — 1. Thus 4.7 
shows that p equals z — A times some polynomial of degree m — 1, as desired. 

To prove the implication in the other direction, now suppose that there is 
a polynomial q € P(F) such that p(z) = (z — A)q(z) for every z € F. Then 
p(A) = (A —A)q(A) = 0, as desired. 
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Now we can prove that polynomials do not have too many zeros. 


4.8 degree m implies at most m zeros 


Suppose m is a positive integer and p € P(F) is a polynomial of degree m. 
Then p has at most m zeros in F. 


Proof We will use induction on m. The desired result holds if m = 1 because 
ifa, # 0 then the polynomial a, + a,z has only one zero (which equals —a,/a,). 
Thus assume that m > 1 and the desired result holds for m — 1. 

If p has no zeros in F, then the desired result holds and we are done. Thus 
suppose p has a zero A € F. By 4.6, there is polynomial q € P(F) of degree 
m — 1 such that 

p(z) = (2— A)q(z) 
for every z € F. Our induction hypothesis implies that g has at most m — 1 zeros 
in F. The equation above shows that the zeros of p in F are exactly the zeros of q 
in F along with A. Thus p has at most m zeros in F. 


The result above implies that the coefficients of a polynomial are uniquely 
determined (because if a polynomial had two different sets of coefficients, then 
subtracting the two representations of the polynomial would give a polynomial 
with some nonzero coefficients but infinitely many zeros). In particular, the degree 
of a polynomial is uniquely defined. 


Recall that the degree of the 0 poly- The 0 polynomial is declared to have 
nomial is defined to be —co. When degree —oo so that exceptions are not 


necessary, use the expected arithmetic —_yeeded for various reasonable results 
with —co. For example, —co < mand — such as deg(pq) = degp + degq. 
—oo + m = —oco for every integer m. 


Division Algorithm for Polynomials 


If p and s are nonnegative integers, with s # 0, then there exist nonnegative 
integers q and r such that 
p=sqt+r 

and r < s. Think of dividing p by s, getting quotient q with remainder r. Our next 
result gives an analogous result for polynomials. Thus the next result is often 
called the division algorithm for polynomials, although as stated here it is not 
really an algorithm, just a useful result. 

The division algorithm for polynomi- Think of the division algorithm for poly- 
als could be proved without using any — »gmials as giving a remainder polyno- 


linear algebra. However, as is appropri- nial r when the polynomial p is divided 
ate for a linear algebra textbook, the proof by the polynomial s. 


given here uses linear algebra techniques 
and makes nice use of a basis of ?,,(F), which is the (1 + 1)-dimensional vector 
space of polynomials with coefficients in F and of degree at most n. 
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4.9 division algorithm for polynomials 


Suppose that p,s € P(F), with s # 0. Then there exist unique polynomials 


q,r € P(F) such that 


p=sqtr 


and degr < degs. 


Proof Let n = deg p and let m = degs. If n < m, then take q = 0 andr = p to 
get the desired equation p = sq + r with degr < deg s. Thus we now assume that 
n>m. 

The list 


4.10 1p ps0 2 4, 8, ZS,.0105 2" 


is linearly independent in ?,, (F) because each polynomial in this list has a different 
degree. Also, the list 4.10 has length n + 1, which equals dim ?,, (F). Hence 4.10 
is a basis of P,,(F) [by 2.38]. 

Because p € P,,(F) and 4.10 is a basis of ?,,(F), there exist unique constants 
Ag, Ay, +++, Ay, 1 © F and bo, by, ...,b,_, © F such that 


4.11 P= Ay + AyZ te + Ay 12" —1 + bos + byZ8 + + Dy ZS 


= Ag + AyZ + $A 4 2Z"—1 48 (Dg + DZ + + Dy 2"). 
—_—___————o—o— a, 
" 7 


With r and q as defined above, we see that p can be written as p = sq + r with 
degr < degs, as desired. 

The uniqueness of q,r € P(F) satisfying these conditions follows from the 
uniqueness of the constants do, a), ...,4,, 1 © F and bo, by, ...,0,_, € F satisfy- 
ing 4.11. 


Factorization of Polynomials over C 


W have been handling polynomials with The fundamental theorem of algebra is 
complex coefficients and polynomials — gy existence theorem. Its proof does 
with real coefficients simultaneously, let- 04 Jead to a method for finding zeros. 
ting F denote R or C. Now we will — The quadratic formula gives the zeros 
see differences between these two cases. explicitly for polynomials of degree 2. 
First we treat polynomials with complex —_ Similar but more complicated formulas 
coefficients. Then we will use those re- _ exist for polynomials of degree 3 and 4. 
sults to prove corresponding results for | No such formulas exist for polynomials 
polynomials with real coefficients. of degree 5 and above. 
Our proof of the fundamental theorem 

of algebra implicitly uses the result that a continuous real-valued function on a 
closed disk in R? attains a minimum value. A web search can lead you to several 
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other proofs of the fundamental theorem of algebra. The proof using Liouville’s 
theorem is particularly nice if you are comfortable with analytic functions. All 
proofs of the fundamental theorem of algebra need to use some analysis, because 
the result is not true if C is replaced, for example, with the set of numbers of the 
form c + di where c,d are rational numbers. 


4.12 fundamental theorem of algebra, first version 


Every nonconstant polynomial with complex coefficients has a zero in C. 


Proof De Moivre’s theorem, which you can prove using induction on k and the 
addition formulas for cosine and sine, states that if k is a positive integer and 
6 ER, then 

(cos 6 + isin 6)* = coské + isinké. 


Suppose w € C and k is a positive integer. Using polar coordinates, we know 
that there exist r > 0 and 6 € R such that 


r(cos@+isin@) = w. 


De Moivre’s theorem implies that 


k 
(r*"*(cos +isin ‘)) = w. 
Thus every complex number has a k" root, a fact that we will soon use. 

Suppose p is a nonconstant polynomial with complex coefficients and highest- 
order nonzero term c,,z’”. Then |p(z)| > co as |z| > oo (because |p(z)|/|z""| > Icynl 
as |z| > oo). Thus the continuous function z + |p(z)| has a global minimum at 
some point € € C. To show that p(@) = 0, suppose that p(¢) # 0. 

Define a new polynomial g by 


_ p@t+é) 
92) = 


The function z + |q(z)| has a global minimum value of 1 at z = 0. Write 


q(z) =1+azk +++ +4,,2™ 
where k is the smallest positive integer such that the coefficient of z* is nonzero; 
in other words, a, # 0. 

Let B € C be such that BK = -<. There is a constant c > 1 such that if 

t € (0,1), then 

iq(tB)| < |1 + a,t*B*| + t+ 1c 

=1=f(1 =i). 

Thus taking t to be 1/(2c) in the inequality above, we have |q(t6)| < 1, which 


contradicts the assumption that the global minimum of z » |q(z)| is 1. This 
contradiction implies that p(¢) = 0, showing that p has a zero, as desired. 
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Computers can use clever numerical methods to find good approximations to 
the zeros of any polynomial, even when exact zeros cannot be found. For example, 
no one will ever give an exact formula for a zero of the polynomial p defined by 


p(x) =x° — 5x4 — 6x3 +.17x7 + 4x —7. 


However, a computer can find that the zeros of p are approximately the five 
numbers —1.87, —0.74, 0.62, 1.47, 5.51. 

The first version of the fundamental theorem of algebra leads to the following 
factorization result for polynomials with complex coefficients. Note that in this 
factorization, the zeros of p are the numbers /,,..., A,,,, which are the only values 
of z for which the right side of the equation in the next result equals 0. 


4.13 fundamental theorem of algebra, second version 


If p € P(C) is a nonconstant polynomial, then p has a unique factorization 
(except for the order of the factors) of the form 


Ae) = Ce = Aprae = al). 


where c, Aj,...,A,, € C. 


Proof Let p € P(C) and let m = deg p. We will use induction on m. If m = 1, 
then the desired factorization exists and is unique. So assume that m > 1 and that 
the desired factorization exists and is unique for all polynomials of degree m — 1. 

First we will show that the desired factorization of p exists. By the first version 
of the fundamental theorem of algebra (4.12), p has a zero A € C. By 4.6, there 
is a polynomial q of degree m — 1 such that 


p(z) = (z2— A)q(z) 


for all z € C. Our induction hypothesis implies that g has the desired factorization, 
which when plugged into the equation above gives the desired factorization of p. 

Now we turn to the question of uniqueness. The number c is uniquely deter- 
mined as the coefficient of z” in p. So we only need to show that except for the 


order, there is only one way to choose Aj, ..., A,,,. If 


(2 — Aq)e(Z— Ag) = (Z— Ty) (2 — Ty) 


for all z € C, then because the left side of the equation above equals 0 when 
z = A,, one of the T’s on the right side equals A,. Relabeling, we can assume 
that tT, = A,. Now if z # A,, we can divide both sides of the equation above by 
z— A, getting 

(Z — Ag) (Z — Ay) = (Z — Te) (Z — Ty) 


for all z € C except possibly z = A,. Actually the equation above holds for all 
z € C, because otherwise by subtracting the right side from the left side we would 
get a nonzero polynomial that has infinitely many zeros. The equation above and 
our induction hypothesis imply that except for the order, the A’s are the same as 
the t’s, completing the proof of uniqueness. 
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Factorization of Polynomials over R 


A polynomial with real coefficients may qj, failure of the fundamental theorem 
have no real zeros. For example, the poly- of algebra for R accounts for the differ- 


. PA i 
nomial 1 + x* has no real zeros. ences between linear algebra on real 
To obtain a factorization theorem over and complex vector spaces, as we will 


R, we will use our factorization theorem © see in later chapters. 
over C. We begin with the next result. 


4.14 polynomials with real coefficients have nonreal zeros in pairs 


Suppose p € P(C) is a polynomial with real coefficients. If A € C is a zero 
of p, then so is A. 


Proof Let 
P(Z) = Ag tayZ te +42", 


where dp, ..., 4, are real numbers. Suppose A € C is a zero of p. Then 
fg t+ajA+--4+4,,A™ = 0. 

Take the complex conjugate of both sides of this equation, obtaining 
Oy tajAt--+a,A™" =0, 


where we have used basic properties of the complex conjugate (see 4.4). The 
equation above shows that A is a zero of p. 


We want a factorization theorem for 
polynomials with real coefficients. We 
begin with the following result. 


4.15 factorization of a quadratic polynomial 


Suppose b,c € R. Then there is a polynomial factorization of the form 


Think about the quadratic formula in 
connection with the result below. 


x?2+bx+c= (x -A,)(x—-A,) 


with A,,A, © R if and only if b? > 4c. 


Proof Notice that 


b\2 b? 
x +bx+c= (x+ >) +(c-). 


First suppose b* < 4c. Then the right The equation above is the basis of 
side of the equation above is positive for pe technique called completing the 
every x € R. Hence the polynomial square, 

x? + bx + c has no real zeros and thus 
cannot be factored in the form (x — A,)(x — A,) with A,, A, E R. 
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Conversely, now suppose b? > 4c. Then there is a real number d such that 


eu 


7 — c. From the displayed equation above, we have 


42 
4+ bx+c= (x+ >) — 


= : d : d 
= (x45 +8)(24 5-4), 
which gives the desired factorization. 


The next result gives a factorization of a polynomial over R. The idea of 
the proof is to use the second version of the fundamental theorem of algebra 
(4.13), which gives a factorization of p as a polynomial with complex coefficients. 
Complex but nonreal zeros of p come in pairs; see 4.14. Thus if the factorization 
of p as an element of P(C) includes terms of the form (x — A) with A a nonreal 
complex number, then (x — A) is also a term in the factorization. Multiplying 
together these two terms, we get 


(x? — 2(Re A)x + |Al*), 


which is a quadratic term of the required form. 

The idea sketched in the paragraph above almost provides a proof of the 
existence of our desired factorization. However, we need to be careful about 
one point. Suppose A is a nonreal complex number and (x — A) is a term in the 
factorization of p as an element of P(C). We are guaranteed by 4.14 that (x — A) 
also appears as a term in the factorization, but 4.14 does not state that these two 
factors appear the same number of times, as needed to make the idea above work. 
However, the proof works around this point. 

In the next result, either m or M may equal 0. The numbers A,,..., A,,, are 
precisely the real zeros of p, for these are the only real values of x for which the 
right side of the equation in the next result equals 0. 


4.16 factorization of a polynomial over R 


Suppose p € P(R) is a nonconstant polynomial. Then p has a unique factor- 
ization (except for the order of the factors) of the form 


P(X) = C(X — Ay) (% — Aggy) (x? + Dy x + Cy) -=+(x? + Dyyx + Cry), 


where c, Ay, 0.25 Aggy, 07, «+55 Bags C1 + Cy © R, with Be < 4c, for each k. 


Proof First we will prove that the desired factorization exists, and after that we 
will prove the uniqueness. 

Think of p as an element of ?(C). If all (complex) zeros of p are real, then 
we have the desired factorization by 4.13. Thus suppose p has a zero A € C with 
AER. By 4.14, A is a zero of p. Thus we can write 
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p(x) = (x — A)(x — A)gq(x) 
= (x? — 2(Re A)x + JAI?) q(x) 
for some polynomial g € P(C) of degree two less than the degree of p. If we 
can prove that g has real coefficients, then using induction on the degree of p 


completes the proof of the existence part of this result. 
To prove that q has real coefficients, we solve the equation above for q, getting 


p(x) 


4@) >Re Ae 


for all x € R. The equation above implies that q(x) € R for all x € R. Writing 
Q(X) = ay +OyX te +4, 9x", 
where n = degp and dp, ...,4,,_» © C, we thus have 
0 = Imq(x) = (Imag) + (Ima,)x + + + (Ima,,_5)x"~? 


for all x € R. This implies that Ima, ..., Ima,,_ > all equal 0 (by 4.8). Thus all 
coefficients of g are real, as desired. Hence the desired factorization exists. 

Now we turn to the question of uniqueness of our factorization. A factor of p 
of the form x? + b,x +c, with bg < 4c, can be uniquely written as (x — A,)(x—A,) 
with A, € C. A moment’s thought shows that two different factorizations of p as 
an element of ?(R) would lead to two different factorizations of p as an element 
of P(C), contradicting 4.13. 


Exercises 4 


1 Suppose w,z € C. Verify the following equalities and inequalities. 
(a) z+Z=2Rez 
(b) z—Z = 2(Imz)i 


(c) 22 =|z/7 


(d) w+Z=W+Zandwz=WzZ 
(e) Z=z 
(f) |Rez| < |z| and |Imz| < [z| 
(g) zl = [zl 
(h) |wz| = || |z| 
The results above are the parts of 4.4 that were left to the reader. 


2 Prove that if w,z € C, then | |w| — izi| <|w—-zl. 


The inequality above is called the reverse triangle inequality. 
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Suppose V is a complex vector space and g € V’. Define 7: V > R by 
o(v) = Re g(v) for each v € V. Show that 


Q(v) = o(v) — io (iv) 
for all v € V. 
Suppose m is a positive integer. Is the set 
{0} U {p € P(F) : degp = m} 
a subspace of P(F)? 


Is the set 
{0} U {p € P(F) : deg p is even} 


a subspace of P(F)? 


Suppose that m and n are positive integers with m < n, and suppose 
Ay,--5A, © F. Prove that there exists a polynomial p € P/(F) with 
degp = n such that 0 = p(A,) = + = p(A,,) and such that p has no 
other zeros. 


Suppose that m is a nonnegative integer, Z},...,Z,, 41 are distinct elements 
of F, and w,,...,W,,,1 © F. Prove that there exists a unique polynomial 
p © P,,(F) such that 

P(Z) = We 
for each k = 1,...,m+ 1. 


This result can be proved without using linear algebra. However, try to find 
the clearer, shorter proof that uses some linear algebra. 


Suppose p € P(C) has degree m. Prove that p has m distinct zeros if and 
only if p and its derivative p’ have no zeros in common. 


Prove that every polynomial of odd degree with real coefficients has a real 
zero. 


For p € P(R), define Tp: R — R by 


Pens 2S) if x #3, 
(Tp)(x)=4  *-3 
p(3) ifx =3 


for each x € R. Show that Tp € P(R) for every polynomial p € P(R) and 
also show that T: P(R) > P(R) is a linear map. 
Suppose p € P(C). Define gq: C > C by 


q(Z) = p(z) p(Z). 


Prove that q is a polynomial with real coefficients. 
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12 Suppose m is a nonnegative integer and p € ?,,,(C) is such that there are 
distinct real numbers X9, X1,...,X,,, with p(x,) € R for each k = 0,1,...,m. 
Prove that all coefficients of p are real. 


13 Suppose p € P(F) with p # 0. Let U = {pq : q © P(F)}. 


(a) Show that dim P(F)/U = deg p. 
(b) Find a basis of P(F)/U. 


14 Suppose p,q € P(C) are nonconstant polynomials with no zeros in common. 
Let m = deg p and n = deg q. Use linear algebra as outlined below in (a)-(c) 
to prove that there exist r € P,,_,(C) ands € P,,_(C) such that 


rp+sq=1. 
(a) Define T: P,_4(C) x P41 (C) > Paya 1 (©) by 
T(r,s) = rp + sq. 


Show that the linear map T is injective. 
(b) Show that the linear map T in (a) is surjective. 


(c) Use (b) to conclude that there exist r € P,,_,(C) ands € P,,,_4(C) 
such that rp + sq = 1. 
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Chapter 5 ae 
Eigenvalues and Eigenvectors 


Linear maps from one vector space to another vector space were the objects of 
study in Chapter 3. Now we begin our investigation of operators, which are linear 
maps from a vector space to itself. Their study constitutes the most important 
part of linear algebra. 

To learn about an operator, we might try restricting it to a smaller subspace. 
Asking for that restriction to be an operator will lead us to the notion of invariant 
subspaces. Each one-dimensional invariant subspace arises from a vector that 
the operator maps into a scalar multiple of the vector. This path will lead us to 
eigenvectors and eigenvalues. 

We will then prove one of the most important results in linear algebra: every 
operator on a finite-dimensional nonzero complex vector space has an eigenvalue. 
This result will allow us to show that for each operator on a finite-dimensional 
complex vector space, there is a basis of the vector space with respect to which 
the matrix of the operator has at least almost half its entries equal to 0. 


standing assumptions for this chapter 


e F denotes R or C. 
e V denotes a vector space over F. 


AG OD [2180q J9}eq-SueH 


Statue of Leonardo of Pisa (1170-1250, approximate dates), also known as Fibonacci. 
Exercise 21 in Section 5D shows how linear algebra can be used to find 
the explicit formula for the Fibonacci sequence shown on the front cover. 
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SA Invariant Subspaces 


Eigenvalues 


A linear map from a vector space to itself is called an operator. 


Suppose T € £(V). Ifm 2 2 and Recall that we defined the notation 
V=V,@-:: @V,,, £(V) to mean £(V,V). 


where each V, is a nonzero subspace of V, then to understand the behavior of 
T we only need to understand the behavior of each Ty here Tly, denotes the 
restriction of T to the smaller domain V,. Dealing with T|y, should be easier than 
dealing with T because V, is a smaller vector space than V. 

However, if we intend to apply tools useful in the study of operators (such 
as taking powers), then we have a problem: T|y, may not map V, into itself; in 
other words, T|y, may not be an operator on V,. Thus we are led to consider only 
decompositions of V of the form above in which T maps each V, into itself. Hence 
we now give a name to subspaces of V that get mapped into themselves by T. 


definition: invariant subspace 


Suppose T € £(V). A subspace U of V is called invariant under T if Tu € U 
for every u € U. 


Thus U is invariant under T if T|,; is an operator on U. 


5.3 example: subspace invariant under differentiation operator 


Suppose that T € L(P(R)) is defined by Tp = p’. Then P,(R), which is a 
subspace of ?(R), is invariant under T because if p € P(R) has degree at most 4, 
then p’ also has degree at most 4. 


5.4 example: four invariant subspaces, not necessarily all different 


If T € L(V), then the following subspaces of V are all invariant under T. 
{0} The subspace {0} is invariant under T because if u € {0}, then u = 0 
and hence Tu = 0 € {0}. 
V The subspace V is invariant under T because if u € V, then Tu € V. 
nullT The subspace null T is invariant under T because if uv € null T, then 
Tu = 0, and hence Tu € null T. 


range T The subspace range T is invariant under T because if u € range T, 
then Tu € range T. 
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Must an operator T € £(V) have any invariant subspaces other than {0} 
and V? Later we will see that this question has an affirmative answer if V is 
finite-dimensional and dim V > 1 (for F = C) or dimV > 2 (for F = R); see 
5.19 and Exercise 29 in Section 5B. 

The previous example noted that null T and range T are invariant under T. 
However, these subspaces do not necessarily provide easy answers to the question 
above about the existence of invariant subspaces other than {0} and V, because 
null T may equal {0} and range T may equal V (this happens when T is invertible). 

We will return later to a deeper study of invariant subspaces. Now we turn to 
an investigation of the simplest possible nontrivial invariant subspaces—invariant 
subspaces of dimension one. 

Take any v € V with v 0 and let U equal the set of all scalar multiples of v: 


U = {Av: A € F} = span(v). 


Then U is a one-dimensional subspace of V (and every one-dimensional subspace 
of V is of this form for an appropriate choice of v). If U is invariant under an 
operator T € £(V), then Tv € U, and hence there is a scalar A € F such that 


Tv = Av. 


Conversely, if Tv = Av for some A € F, then span(v) is a one-dimensional 
subspace of V invariant under T. 

The equation Tv = Av, which we have just seen is intimately connected with 
one-dimensional invariant subspaces, is important enough that the scalars A and 
vectors v satisfying it are given special names. 


5.5 definition: eigenvalue 


Suppose T € £(V). A number A € F is called an eigenvalue of T if there 
exists v € V such that v #4 0 and Tv = Av. 


In the definition above, we require The word eigenvalue is half-German, 


that v # 0 because every scalar A € F half-English. The German prefix eigen 


satisfies TO = AO. means “own” in the sense of charac- 
The comments above show that V - {erizing an intrinsic property. 


has a one-dimensional subspace invariant 
under T if and only if T has an eigenvalue. 


5.6 example: eigenvalue 


Define an operator T € £(F°) by 
T(x, y,Z) = (7x + 3z, 3x + 6y + 9z, —6y) 


for (x,y,z) € F* Then T(3,1,-1) = (18,6,-6) = 6(3,1,—1). Thus 6 is an 
eigenvalue of T. 
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The equivalences in the next result, along with many deep results in linear 
algebra, are valid only in the context of finite-dimensional vector spaces. 


5.7 equivalent conditions to be an eigenvalue 


Suppose V is finite-dimensional, T € £(V), and A € F. Then the following 
are equivalent. 


(a) A is an eigenvalue of T. 
(b) T — Alis not injective. Reminder: I € L(V) is the identity 
(c) T — Alis not surjective operator. Thus Iv = v forallv € V. 


(d) T — Al is not invertible. 


Proof Conditions (a) and (b) are equivalent because the equation Tv = Av 
is equivalent to the equation (T — AI)v = 0. Conditions (b), (c), and (d) are 
equivalent by 3.65. 


(5.8 definition: eigenvector 


Suppose T € £(V) and A € F is an eigenvalue of T. A vector v € V is called 
an eigenvector of T corresponding to A if v # 0 and Tv = Av. 


In other words, a nonzero vector v € V is an eigenvector of an operator 
T € £(V) if and only if Tv is a scalar multiple of v. Because Tv = Av if and only 
if (T —ADv = 0, a vector v € V with v $ 0 is an eigenvector of T corresponding 
to A if and only if v € null(T — Al). 


5.9 example: eigenvalues and eigenvectors 


Suppose T € £(F*) is defined by T(w,z) = (—z,w). 

(a) First consider the case F = R. Then T is a counterclockwise rotation by 90° 
about the origin in R% An operator has an eigenvalue if and only if there 
exists a nonzero vector in its domain that gets sent by the operator to a scalar 
multiple of itself. A 90° counterclockwise rotation of a nonzero vector in R? 
cannot equal a scalar multiple of itself. Conclusion: if F = R, then T has no 
eigenvalues (and thus has no eigenvectors). 


(b) Now consider the case F = C. To find eigenvalues of T, we must find the 
scalars A such that T(w, z) = A(w, z) has some solution other than w = z = 0. 
The equation T(w,z) = A(w, z) is equivalent to the simultaneous equations 


5.10 —z=Aw, w= dz. 


Substituting the value for w given by the second equation into the first equation 
gives 
—z = Az. 
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Now z cannot equal 0 [otherwise 5.10 implies that w = 0; we are looking for 
solutions to 5.10 such that (w, z) is not the 0 vector], so the equation above 
leads to the equation 

-1=A2. 
The solutions to this equation are A = i and A = —i. 


You can verify that i and —i are eigenvalues of T. Indeed, the eigenvectors 
corresponding to the eigenvalue i are the vectors of the form (w, —wi), with 
w € Cand w ¥# 0. Furthermore, the eigenvectors corresponding to the 
eigenvalue —i are the vectors of the form (w, wi), with w € C and w + 0. 


In the next proof, we again use the equivalence 


Tv=Av = (T-Alv=0. 


11 linearly independent eigenvectors 


Suppose T € £(V). Then every list of eigenvectors of T corresponding to 
distinct eigenvalues of T is linearly independent. 


Proof Suppose the desired result is false. Then there exists a smallest positive 
integer m such that there exists a linearly dependent list v1, ..., v,, of eigenvectors 
of T corresponding to distinct eigenvalues A,, ..., A,,, of T (note that m > 2 because 
an eigenvector is, by definition, nonzero). Thus there exist ay, ...,a,,, © F, none of 
which are 0 (because of the minimality of 1), such that 


M40, ++ +4,,0,, = 0. 
Apply T — A,,,I to both sides of the equation above, getting 


A, (Ay — Ay )Oy Ht $F Ay (A —Ay)0m—1 = 0. 


m—1 
Because the eigenvalues A,,...,A,,, are distinct, none of the coefficients above 
equal 0. Thus 7}, ...,V,,_ is a linearly dependent list of m — 1 eigenvectors of T 
corresponding to distinct eigenvalues, contradicting the minimality of m. This 
contradiction completes the proof. 


The result above leads to a short proof of the result below, which puts an upper 
bound on the number of distinct eigenvalues that an operator can have. 


5.12 operator cannot have more eigenvalues than dimension of vector space 


Suppose V is finite-dimensional. Then each operator on V has at most dim V 
distinct eigenvalues. 


Proof Let T € £(V). Suppose Aj,...,A,, are distinct eigenvalues of T. Let 
V1, «++, U,, be corresponding eigenvectors. Then 5.11 implies that the list vj, ..., v,,, 
is linearly independent. Thus m < dim V (see 2.22), as desired. 
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Polynomials Applied to Operators 


The main reason that a richer theory exists for operators (which map a vector 
space into itself) than for more general linear maps is that operators can be raised 
to powers. In this subsection we define that notion and the concept of applying a 
polynomial to an operator. This concept will be the key tool that we use in the 
next section when we prove that every operator on a nonzero finite-dimensional 
complex vector space has an eigenvalue. 

If T is an operator, then TT makes sense (see 3.7) and is also an operator on 
the same vector space as T. We usually write T* instead of TT. More generally, 
we have the following definition of T”. 


5.13 notation: T” 


Suppose T € L(V) and m is a positive integer. 


e T” € L(V) is defined by T” = it, 


m times 


e T° is defined to be the identity operator I on V. 
e If T is invertible with inverse T~', then T-” € L(V) is defined by 


Po a(T ty” 


You should verify that if T is an operator, then 
T!@yT" = pmtn and CEeye = ypu 


where m and n are arbitrary integers if T is invertible and are nonnegative integers 
if T is not invertible. 

Having defined powers of an operator, we can now define what it means to 
apply a polynomial to an operator. 


5.14 notation: p(T) 


Suppose T € £(V) and p € P(F) is a polynomial given by 


m 


P(Z) = Ag + AZ + Anz* + 0 + AZ 


for all z € F. Then p(T) is the operator on V defined by 


p(T) = aol + a,T + aT? + +4,,T™ 


This is a new use of the symbol p because we are applying p to operators, not 
just elements of F. The idea here is that to evaluate p(T), we simply replace z with 
T in the expression defining p. Note that the constant term ag in p(z) becomes the 
operator ay! (which is a reasonable choice because ay = ayz° and thus we should 
replace ay with ayT°, which equals ag/). 
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5.15 example: a polynomial applied to the differentiation operator 


Suppose D € £(P(R)) is the differentiation operator defined by Dq = q’ and 
p is the polynomial defined by p(x) = 7 — 3x + 5x”. Then p(D) = 71 —3D + 5D”. 
Thus 
(p(D))q = 74 — 39! + 5q" 


for every g € P(R). 


If we fix an operator T € Z(V), then the function from P(F) to Z(V) given 
by p & p(T) is linear, as you should verify. 


If p,q € P(F), then pq € P(F) is the polynomial defined by 


(pq) (Z) = p(Z)q(Z) 


for allz € F. 


The order does not matter in taking products of polynomials of a single 
operator, as shown by (b) in the next result. 


5.17 multiplicative properties 


Suppose p,q € P(F) and T € L(V). 


Informal proof: When a product of 


Then ae é : 
polynomials is expanded using the dis- 
(a) (pq)(T) = p(T) q(T); tributive property, it does not matter 
(b) p(T)q(T) = q(T)p(T). whether the symbol is z or T. 
Proof 
(a) Suppose p(z) = os a2) and q(z) = ye bee for all z € F. Then 
j=0 k=0 
(pq)(z) = Yd, ajbyzl*® 
j=0k=0 
Thus 
(pq(T) = >) >) ab TI** 
j=0k=0 
2 ( ‘: a,T')( y b,T*) 
j=0 k=0 


(b) Using (a) twice, we have p(T)q(T) = (pq)(T) = (qp)(T) = q(T)p(T). 
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We observed earlier that if T € 2(V), then the subspaces null T and range T 
are invariant under T (see 5.4). Now we show that the null space and the range of 
every polynomial of T are also invariant under T. 


5.18 null space and range of p(T) are invariant under T 


Suppose T € L(V) andp © P(F). Then nullp(T) and range p(T) are 
invariant under T. 


Proof Suppose u € nullp(T). Then p(T)u = 0. Thus 
(p(T))(Tu) = T(p(T)u) = TO) = 0. 


Hence Tu € nullp(T). Thus null p(T) is invariant under T, as desired. 
Suppose u € range p(T). Then there exists v € V such that u = p(T)v. Thus 


Tu = T(p(T)o) = p(T) (To). 


Hence Tu € range p(T). Thus range p(T) is invariant under T, as desired. 


Exercises 5A 


1 Suppose T € £(V) and U is a subspace of V. 


(a) Prove that if U C null T, then U is invariant under T. 
(b) Prove that if range T C U, then U is invariant under T. 


2 Suppose that T € £(V) and Vj,..., V,,, are subspaces of V invariant under T. 
Prove that V, + --- + V,,, is invariant under T. 


3 Suppose T € L(V). Prove that the intersection of every collection of 
subspaces of V invariant under T is invariant under T. 


4 Prove or give a counterexample: If V is finite-dimensional and U is a sub- 
space of V that is invariant under every operator on V, then U = {0} or 
U=V. 


5 Suppose T € £(R7) is defined by T(x, y) = (—3y, x). Find the eigenvalues 
of T. 


6 Define T € L(F*) by T(w,z) = (z, w). Find all eigenvalues and eigenvec- 
tors of T. 


7 Define T € L(F°) by T(z1, 29,23) = (22,0, 523). Find all eigenvalues and 
eigenvectors of T. 


8 Suppose P € L(V) is such that P? = P. Prove that if A is an eigenvalue of P, 
then A =OorA =1. 
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Define T: P(R) > P(R) by Tp = p’ Find all eigenvalues and eigenvectors 
of T. 


Define T € £(P4(R)) by (Tp) (x) = xp(x) for all x € R. Find all eigenval- 
ues and eigenvectors of T. 


Suppose V is finite-dimensional, T € £(V), and a € F. Prove that there ex- 
ists 6 > Osuch that T—AlI is invertible for all A € F such that 0 < |a — A| < 6. 


Suppose V = U @ W, where U and W are nonzero subspaces of V. Define 
Pe £(V) by Ptu+w) = u for each u € U and each w € W. Find all 
eigenvalues and eigenvectors of P. 


Suppose T € £(V). Suppose S € L(V) is invertible. 


(a) Prove that T and S~!TS have the same eigenvalues. 
(b) What is the relationship between the eigenvectors of T and the eigenvec- 
tors of S-!'TS? 


Give an example of an operator on R* that has no (real) eigenvalues. 


Suppose V is finite-dimensional, T € Z(V), and A € F. Show that A is 
an eigenvalue of T if and only if A is an eigenvalue of the dual operator 
TE L(V’). 


Suppose 7v,,...,v, is a basis of V and T € L(V). Prove that if A is an 
eigenvalue of T, then 


IAl $ nmax{|M(T), |? 1<j,k <n}, 


where M(T); , denotes the entry in row j, column k of the matrix of T with 
respect to the basis 7), ..., v,,. 


See Exercise 19 in Section 6A for a different bound on |A\. 


Suppose F = R, T € £(V), and A € R. Prove that A is an eigenvalue of T 
if and only if A is an eigenvalue of the complexification Te. 


See Exercise 33 in Section 3B for the definition of Te. 


Suppose F = R, T € £(V), and A € C. Prove that A is an eigenvalue of 
the complexification T¢ if and only if A is an eigenvalue of Tc. 


Show that the forward shift operator T € Z(F°) defined by 
T (Z1,Z9, +.) = (0,24, Z9,---) 

has no eigenvalues. 

Define the backward shift operator S € £(F°) by 
S (21, 29,23, +++) = (Z9,2Z3,6..). 


(a) Show that every element of F is an eigenvalue of S. 
(b) Find all eigenvectors of S. 
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Suppose T € £(V) is invertible. 

(a) Suppose A € F with A # 0. Prove that A is an eigenvalue of T if and 
only if 4 is an eigenvalue of T~1. 

(b) Prove that T and T-! have the same eigenvectors. 


Suppose T € £(V) and there exist nonzero vectors u and w in V such that 
Tu=3w and Tw =3u. 
Prove that 3 or —3 is an eigenvalue of T. 


Suppose V is finite-dimensional and S,T € £(V). Prove that ST and TS 
have the same eigenvalues. 


Suppose A is an n-by-n matrix with entries in F. Define T € £(F") by 
Tx = Ax, where elements of F” are thought of as n-by-1 column vectors. 


(a) Suppose the sum of the entries in each row of A equals 1. Prove that 1 
is an eigenvalue of T. 

(b) Suppose the sum of the entries in each column of A equals 1. Prove that 
1 is an eigenvalue of T. 


Suppose T € L(V) and u, w are eigenvectors of T such that u + w is also 
an eigenvector of T. Prove that u and w are eigenvectors of T corresponding 
to the same eigenvalue. 


Suppose T € L(V) is such that every nonzero vector in V is an eigenvector 
of T. Prove that T is a scalar multiple of the identity operator. 


Suppose that V is finite-dimensional and k € {1,...,dim V — 1}. Suppose 
T € L(V) is such that every subspace of V of dimension k is invariant 
under T. Prove that T is a scalar multiple of the identity operator. 


Suppose V is finite-dimensional and T € £(V). Prove that T has at most 
1 + dim range T distinct eigenvalues. 


Suppose T € £(R°) and —4, 5, and V7 are eigenvalues of T. Prove that 
there exists x € R° such that Tx — 9x = (—4,5, V7). 


Suppose T € L(V) and (T — 21)(T — 31)(T — 41) = 0. Suppose A is an 
eigenvalue of T. Prove that A = 20r A =3o0rA = 4. 


Give an example of T € £(R*) such that T* = —I. 
Suppose T € L(V) has no eigenvalues and T* = I. Prove that T? = —I. 


Suppose T € £(V) and m is a positive integer. 


(a) Prove that T is injective if and only if T” is injective. 
(b) Prove that T is surjective if and only if T” is surjective. 
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Suppose V is finite-dimensional and 7,,...,v,, © V. Prove that the list 
V1, +++» U,, is linearly independent if and only if there exists T € 2(V) such 
that v,,...,U,, are eigenvectors of T corresponding to distinct eigenvalues. 


Suppose that A,,...,A,, is a list of distinct real numbers. Prove that the 
list e41*, ...,e4»* is linearly independent in the vector space of real-valued 
functions on R. 
Hint: Let V = span(e**,...,e4"*), and define an operator D € L(V) by 
Df = f". Find eigenvalues and eigenvectors of D. 


Suppose that A,,...,A,, is a list of distinct positive numbers. Prove that 
the list cos(A;x), ..., cos(A,,x) is linearly independent in the vector space of 
real-valued functions on R. 


Suppose V is finite-dimensional and T € £(V). Define A € L£(L£(V)) by 
A(S) =TS 


for each S € £(V). Prove that the set of eigenvalues of T equals the set of 
eigenvalues of A. 


Suppose V is finite-dimensional, T € £(V), and U is a subspace of V 
invariant under T. The quotient operator T/U € £(V/U) is defined by 


(T/U)\(0 + U) = Tv+U 


for each v € V. 


(a) Show that the definition of T/U makes sense (which requires using the 
condition that U is invariant under T) and show that T/U is an operator 
on V/U. 

(b) Show that each eigenvalue of T/U is an eigenvalue of T. 


Suppose V is finite-dimensional and T € £(V). Prove that T has an eigen- 
value if and only if there exists a subspace of V of dimension dim V — 1 that 
is invariant under T. 


Suppose S,T € £(V) and S is invertible. Suppose p € P(F) is a polynomial. 
Prove that 

(STS V2 Spc] Ss 
Suppose T € £(V) and U is a subspace of V invariant under T. Prove that 
U is invariant under p(T) for every polynomial p € P(F). 
Define T € L(F”) by T (x4, X53, 0X y_) = (Xz, 2%, 3X3, ..., NX, ). 
(a) Find all eigenvalues and eigenvectors of T. 


(b) Find all subspaces of F” that are invariant under T. 


Suppose that V is finite-dimensional, dim V > 1, and T € £(V). Prove that 
{p(T) ! p © P(F)} # L(V). 
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Existence of Eigenvalues on Complex Vector Spaces 


Now we come to one of the central results about operators on finite-dimensional 
complex vector spaces. 


5.19 existence of eigenvalues 


Every operator on a finite-dimensional nonzero complex vector space has an 
eigenvalue. 


Proof Suppose V is a finite-dimensional complex vector space of dimension 
n>OandT € L(V). Choose v € V with v # 0. Then 


v, Tv, Tv, ..., T”v 


is not linearly independent, because V has dimension n and this list has length 
n +1. Hence some linear combination (with not all the coefficients equal to 0) 
of the vectors above equals 0. Thus there exists a nonconstant polynomial p of 
smallest degree such that 

p(T)v = 0. 


By the first version of the fundamental theorem of algebra (see 4.12), there 
exists A € C such that p(A) = 0. Hence there exists a polynomial g € P(C) such 
that 

p(Z) = (Z2—A)q(z) 


for every z € C (see 4.6). This implies (using 5.17) that 
0 = p(T)v = (T — Al) (q(T)o). 


Because g has smaller degree than p, we know that q(T)v # 0. Thus the equation 
above implies that A is an eigenvalue of T with eigenvector q(T)v. 


The proof above makes crucial use of the fundamental theorem of algebra. 
The comment following Exercise 16 helps explain why the fundamental theorem 
of algebra is so tightly connected to the result above. 

The hypothesis in the result above that F = C cannot be replaced with the 
hypothesis that F = R, as shown by Example 5.9. The next example shows that 
the finite-dimensional hypothesis in the result above also cannot be deleted. 


5.20 example: an operator on a complex vector space with no eigenvalues 


Define T € £(P(C)) by (Tp)(z) = zp(z). If p € P(C) is a nonzero poly- 
nomial, then the degree of Tp is one more than the degree of p, and thus Tp cannot 
equal a scalar multiple of p. Hence T has no eigenvalues. 

Because ?(C) is infinite-dimensional, this example does not contradict the 
result above. 
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Eigenvalues and the Minimal Polynomial 


In this subsection we introduce an important polynomial associated with each 
operator. We begin with the following definition. 


5.21. definition: monic polynomial 


A monic polynomial is a polynomial whose highest-degree coefficient equals 1. 


For example, the polynomial 2 + 9z? + z” is a monic polynomial of degree 7. 


5.22 existence, uniqueness, and degree of minimal polynomial 


Suppose V is finite-dimensional and T € £(V). Then there is a unique monic 


polynomial p € P(F) of smallest degree such that p(T) = 0. Furthermore, 
deg p < dim V. 


Proof If dim V = 0, then J is the zero operator on V and thus we take p to be 
the constant polynomial 1. 

Now use induction on dim V. Thus assume that dim V > 0 and that the desired 
result is true for all operators on all complex vector spaces of smaller dimension. 
Let v € V be such that v # 0. The list v, Tv,..., T#™”v has length 1 + dim V 
and thus is linearly dependent. By the linear dependence lemma (2.19), there is 
a smallest positive integer m < dim V such that Tv is a linear combination of 
v, Tv, ..., 1’”~v. Thus there exist scalars Cos Cys Cos +++» C1 © F such that 
5.23 CoV +. Cy TU + + Cy yT™ 10 + T0 = Oz 
Define a monic polynomial q € ?,,,(F) by 

Q(Z) = CoH CyZ tre + Cy zm 
Then 5.23 implies that q(T)v = 0. 
If k is a nonnegative integer, then 
q(T)(T*v) = T*(q(T)v) = T*(0) = 0. 
The linear dependence lemma (2.19) shows that v, Tv, ..., T””~ !v is linearly inde- 
pendent. Thus the equation above implies that dim null q(T) > m. Hence 
dim range q(T) = dim V — dimnullg(T) < dim V — m. 


Because range q(T) is invariant under T (by 5.18), we can apply our induction 
hypothesis to the operator T|,angeq(T) On the vector space range q(T). Thus there 
is a monic polynomial s € P(F) with 


degs <dimV—m and 8(Thangeqcr)) = 0: 
Hence for all v € V we have 
(sq)(T)(v) = s(T)(q(T)v) = 0 


because q(T)v € range q(T) and s(T)hrangeq(T) = $(TlrangeqT)) = 0. Thus sq is a 
monic polynomial such that deg sq < dim V and (sq)(T) = 0. 


Section 5B = The Minimal Polynomial 145 


The paragraph above shows that there is a monic polynomial of degree at 
most dim V that when applied to T gives the 0 operator. Thus there is a monic 
polynomial of smallest degree with this property, completing the existence part 
of this result. 

Let p € P(F) be a monic polynomial of smallest degree such that p(T) = 0. 
To prove the uniqueness part of the result, suppose r € P(F) is a monic poly- 
nomial of the same degree as p and r(T) = 0. Then (p — r)(T) = 0 and also 
deg(p —1r) < degp. If p — r were not equal to 0, then we could divide p — r by 
the coefficient of the highest-order term in p — r to get a monic polynomial (of 
smaller degree than p) that when applied to T gives the 0 operator. Thus p—r = 0, 
as desired. 


The previous result justifies the following definition. 


5.24 definition: minimal polynomial 


Suppose V is finite-dimensional and T € £(V). Then the minimal polynomial 
of T is the unique monic polynomial p € P(F) of smallest degree such that 
p(T) = 0. 


To compute the minimal polynomial of an operator T € £(V), we need to 
find the smallest positive integer m such that the equation 


Col + cyT + seep eo4r = =p 


has a solution C9, Cy, ..., Cj, © F. If we pick a basis of V and replace T in the 
equation above with the matrix of T, then the equation above can be thought of 
as a system of (dim V)? linear equations in the m unknowns Cp, Cy, -+s Cm—1 © F. 
Gaussian elimination or another fast method of solving systems of linear equations 
can tell us whether a solution exists, testing successive values m = 1, 2,... until 
a solution exists. By 5.22, a solution exists for some smallest positive integer 
m < dim V. The minimal polynomial of T is then cg + ¢yZ + ++" + Cj) —42" 7 +2". 
Even faster (usually), pick v € V and consider the equation 


5.25 cov +c, To + + Cagle = Te. 


Use a basis of V to convert the equation above to a system of dim V linear equa- 
tions in dim V unknowns Cp, cy,---,Cgimy_—1- If this system of equations has a 
unique solution C9, Cy, ...-, Cgimy —1 (as happens most of the time), then the scalars 
Cg, C1, +++ Cdimy —1> 1 are the coefficients of the minimal polynomial of T (because 
5.22 states that the degree of the minimal polynomial is at most dim V). 

Consider operators on R* (thought 
of as 4-by-4 matrices with respect to the 
standard basis), and take v = (1,0, 0,0) 
in the paragraph above. The faster method described above works on over 99.8% 
of the 4-by-4 matrices with integer entries in the interval [—10, 10] and on over 
99.999% of the 4-by-4 matrices with integer entries in [—100, 100]. 


These estimates are based on testing 
millions of random matrices. 
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The next example illustrates the faster procedure discussed above. 


5.26 example: minimal polynomial of an operator on F° 


Suppose T € £(F°) and 


000 0 -3 
1000 6 
M(T)=| 0 10 0 O 
0010 O 
0001 0 


with respect to the standard basis e,, e9, €3, €4,e€5. Taking v = e, for 5.25, we have 


Pe a Tle STH 63, T°e, = T(T*e,) = Tes = —3e, + 6€. 
Thus 3e, — 6Te, = —T°e,. The list e;, Te,, T7e,, T°e,, T*e,, which equals the list 


€1,€p, €3, €4, es, is linearly independent, so no other linear combination of this list 
equals —T°e,. Hence the minimal polynomial of T is 3 — 6z + 2°. 


Recall that by definition, eigenvalues of operators on V and zeros of polyno- 
mials in P(F) must be elements of F. In particular, if F = R, then eigenvalues 
and zeros must be real numbers. 


5.27 eigenvalues are the zeros of the minimal polynomial 


Suppose V is finite-dimensional and T € £(V). 
(a) The zeros of the minimal polynomial of T are the eigenvalues of T. 


(b) If V is a complex vector space, then the minimal polynomial of T has the 
form 


(Z = Ay) (Z i Nn 


where A,,..., A,,, is a list of all eigenvalues of T, possibly with repetitions. 


Proof Let p be the minimal polynomial of T. 


(a) First suppose A € F is a zero of p. Then p can be written in the form 
p(z) = (Z— A)q(z), 


where g is a monic polynomial with coefficients in F (see 4.6). Because 
p(T) = 0, we have 
0 = (T= AN(q(T)v) 


for all v € V. Because the degree of q is less than the degree of the minimal 
polynomial p, there exists at least one vector v € V such that q(T)v # 0. The 
equation above thus implies that A is an eigenvalue of T, as desired. 
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To prove that every eigenvalue of T is a zero of p, now suppose A € F is 
an eigenvalue of T. Thus there exists v € V with v # 0 such that Tv = Av. 
Repeated applications of T to both sides of this equation show that T’v = Aku 
for every nonnegative integer k. Thus 


p(T)v = p(A)jo. 


Because p is the minimal polynomial of T, we have p(T)v = 0. Hence the 
equation above implies that p(A) = 0. Thus / is a zero of p, as desired. 


(b 


wm 


To get the desired result, use (a) and the second version of the fundamental 
theorem of algebra (see 4.13). 


A nonzero polynomial has at most as many distinct zeros as its degree (see 4.8). 
Thus (a) of the previous result, along with the result that the minimal polynomial 
of an operator on V has degree at most dim V, gives an alternative proof of 5.12, 
which states that an operator on V has at most dim V distinct eigenvalues. 

Every monic polynomial is the minimal polynomial of some operator, as 
shown by Exercise 16, which generalizes Example 5.26. Thus 5.27(a) shows that 
finding exact expressions for the eigenvalues of an operator is equivalent to the 
problem of finding exact expressions for the zeros of a polynomial (and thus is 
not possible for some operators). 


5.28 example: An operator whose eigenvalues cannot be found exactly 


Let T € L(C°) be the operator defined by 
T(Z, Z9,5 235 Z45 Zs) = (—3Zs5, Zy + 625, Z9, 235 Z4). 


The matrix of T with respect to the standard basis of C° is the 5-by-5 matrix in 
Example 5.26. As we showed in that example, the minimal polynomial of T is 
the polynomial 

3-6z4+2°. 


No zero of the polynomial above can be expressed using rational numbers, 
roots of rational numbers, and the usual rules of arithmetic (a proof of this would 
take us considerably beyond linear algebra). Because the zeros of the polynomial 
above are the eigenvalues of T [by 5.27(a)], we cannot find an exact expression 
for any eigenvalue of T in any familiar form. 

Numeric techniques, which we will not discuss here, show that the zeros of the 
polynomial above, and thus the eigenvalues of T, are approximately the following 
five complex numbers: 


—1.67, 0.51, 140, -—0.12+1.597, —0.12 — 1.597. 


Note that the two nonreal zeros of this polynomial are complex conjugates of 
each other, as we expect for a polynomial with real coefficients (see 4.14). 
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The next result completely characterizes the polynomials that when applied to 
an operator give the 0 operator. 


5.29 q(T) =0 <= qisa polynomial multiple of the minimal polynomial 


Suppose V is finite-dimensional, T € £(V), andg € P(F). Then q(T) =0 
if and only if g is a polynomial multiple of the minimal polynomial of T. 


Proof Let p denote the minimal polynomial of T. 
First suppose q(T) = 0. By the division algorithm for polynomials (4.9), there 
exist polynomials s,r € P(F) such that 


5.30 qg=pst+r 
and degr < deg p. We have 
0 = q(T) = p(T)s(T) + r(T) = r(T). 


The equation above implies that r = 0 (otherwise, dividing r by its highest-degree 
coefficient would produce a monic polynomial that when applied to T gives 0; 
this polynomial would have a smaller degree than the minimal polynomial, which 
would be a contradiction). Thus 5.30 becomes the equation g = ps. Hence q is a 
polynomial multiple of p, as desired. 

To prove the other direction, now suppose q is a polynomial multiple of p. 
Thus there exists a polynomial s € P(F) such that q = ps. We have 


q(T) = p(T)s(T) = 0s(T) = 0, 
as desired. 
The next result is a nice consequence of the result above. 
5.31 minimal polynomial of a restriction operator 


Suppose V is finite-dimensional, T € £(V), and U is a subspace of V that is 


invariant under T. Then the minimal polynomial of T is a polynomial multiple 
of the minimal polynomial of T|,;. 


Proof Suppose p is the minimal polynomial of T. Thus p(T)v = 0 for all v € V. 
In particular, 
p(T)u = Oforallu € U. 


Thus p(T) = 0. Now 5.29, applied to the operator T|,, in place of T, implies 
that p is a polynomial multiple of the minimal polynomial of T|,,. 


See Exercise 25 for a result about quotient operators that is analogous to the 
result above. 

The next result shows that the constant term of the minimal polynomial of an 
operator determines whether the operator is invertible. 
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5.32 T not invertible <—» constant term of minimal polynomial of T is 0 


Suppose V is finite-dimensional and T € £(V). Then T is not invertible if 
and only if the constant term of the minimal polynomial of T is 0. 


Proof Suppose T € £(V) and p is the minimal polynomial of T. Then 


T is notinvertible <=» 0 is an eigenvalue of T 
<= Oisazero of p 


<=> the constant term of p is 0, 


where the first equivalence holds by 5.7, the second equivalence holds by 5.27(a), 
and the last equivalence holds because the constant term of p equals p(0). 


Eigenvalues on Odd-Dimensional Real Vector Spaces 


The next result will be the key tool that we use to show that every operator on an 
odd-dimensional real vector space has an eigenvalue. 


5.33 even-dimensional null space 


Suppose F = R and V is finite-dimensional. Suppose also that T € £(V) 
and b,c € R with b* < 4c. Then dim null(T? + bT + cl) is an even number. 


Proof Recall that null(T? + bT + cI) is invariant under T (by 5.18). By replacing 

V with null(T? + bT + cl) and replacing T with T restricted to null(T? + bT + cl), 

we can assume that T? + bT + cl = 0; we now need to prove that dim V is even. 
Suppose A € R and v € V are such that Tv = Av. Then 


0 = (T2+bT +cl)v = (A2 4+ bAtc)v = ((A+3) +e-F)o, 


The term in large parentheses above is a positive number. Thus the equation above 
implies that v = 0. Hence we have shown that T has no eigenvectors. 

Let U be a subspace of V that is invariant under T and has the largest dimension 
among all subspaces of V that are invariant under T and have even dimension. If 
U = V, then we are done; otherwise assume there exists w € V such that w € U. 

Let W = span(w,Tw). Then W is invariant under T because T(Tw) = 
—bTw — cw. Furthermore, dim W = 2 because otherwise w would be an eigen- 
vector of T. Now 


dim(U + W) = dim U + dim W — dim(U nN W) = dim U + 2, 


where UM W = {0} because otherwise U M W would be a one-dimensional 
subspace of V that is invariant under T (impossible because T has no eigenvectors). 
Because U + W is invariant under T, the equation above shows that there exists 
a subspace of V invariant under T of even dimension larger than dim U. Thus the 
assumption that U # V was incorrect. Hence V has even dimension. 
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The next result states that on odd-dimensional vector spaces, every operator 
has an eigenvalue. We already know this result for finite-dimensional complex 
vectors spaces (without the odd hypothesis). Thus in the proof below, we will 
assume that F = R. 


5.34 operators on odd-dimensional vector spaces have eigenvalues 


Every operator on an odd-dimensional vector space has an eigenvalue. 


Proof Suppose F = R and V is finite-dimensional. Let n = dim V, and suppose 
nis an odd number. Let T € £(V). We will use induction on n in steps of size 
two to show that T has an eigenvalue. To get started, note that the desired result 
holds if dim V = 1 because then every nonzero vector in V is an eigenvector of T. 

Now suppose that n > 3 and the desired result holds for all operators on all 
odd-dimensional vector spaces of dimension less than n. Let p denote the minimal 
polynomial of T. If p is a polynomial multiple of x — A for some A € R, then A is 
an eigenvalue of T [by 5.27(a)] and we are done. Thus we can assume that there 
exist b,c © R such that b? < 4c and p is a polynomial multiple of x” + bx + c (see 
4.16). 

There exists a monic polynomial q € P(R) such that p(x) = q(x) (x? + bx +c) 
for all x € R. Now 


O= p(T) = (q(T) (1? + bT + ch), 


which means that q(T) equals 0 on range(T? + bT + cl). Because deg q < deg p 
and p is the minimal polynomial of T, this implies that range(T? + bT + cl) # V. 
The fundamental theorem of linear maps (3.21) tells us that 


dim V = dimnull(T? + bT + cl) + dimrange(T? + bT + cl). 


Because dim V is odd (by hypothesis) and dim null(T? + bT + cl) is even (by 5.33), 
the equation above shows that dim range(T? + bT + cl) is odd. 

Hence range(T? + bT + cl) is a subspace of V that is invariant under T (by 
5.18) and that has odd dimension less than dim V. Our induction hypothesis now 
implies that T restricted to range(T* + bT + cl) has an eigenvalue, which means 
that T has an eigenvalue. 


See Exercise 23 in Section 8B and Exercise 10 in Section 9C for alternative 
proofs of the result above. 


Exercises 5B 


1. Suppose T € L(V). Prove that 9 is an eigenvalue of T? if and only if 3 or 
—3 is an eigenvalue of T. 


2 Suppose V is a complex vector space and T € £(V) has no eigenvalues. 
Prove that every subspace of V invariant under T is either {0} or infinite- 
dimensional. 


10 


11 
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Suppose 1 is a positive integer and T € £(F”) is defined by 


T Avs Rel = Ge ety Fe 


(a) Find all eigenvalues and eigenvectors of T. 
(b) Find the minimal polynomial of T. 


The matrix of T with respect to the standard basis of F" consists of all 1’s. 


Suppose F = C,T € £(V), p € P(C), anda € C. Prove that a is an 
eigenvalue of p(T) if and only if « = p(A) for some eigenvalue A of T. 


Give an example of an operator on R? that shows the result in Exercise 4 
does not hold if C is replaced with R. 


Suppose T € L(F’) is defined by T(w,z) = (—z,w). Find the minimal 
polynomial of T. 


(a) Give an example of $,T € L(F?) such that the minimal polynomial of 
ST does not equal the minimal polynomial of TS. 

(b) Suppose V is finite-dimensional and S,T € £(V). Prove that if at least 
one of S, T is invertible, then the minimal polynomial of ST equals the 
minimal polynomial of TS. 


Hint: Show that if S is invertible and p € P(F), then p(TS) = S~1p(ST)S. 


Suppose T € £(R7*) is the operator of counterclockwise rotation by 1°. Find 
the minimal polynomial of T. 


Because dim R* = 2, the degree of the minimal polynomial of T is at most 2. 
Thus the minimal polynomial of T is not the tempting polynomial x1®° +1, 
even though T'89 = —I. 


Suppose T € £(V) is such that with respect to some basis of V, all entries 
of the matrix of T are rational numbers. Explain why all coefficients of the 
minimal polynomial of T are rational numbers. 


Suppose V is finite-dimensional, T € £(V), and v € V. Prove that 
span(v, Tv, ..., Tv) = span(v, Tv, ..., T#™Y —19) 

for all integers m > dim V — 1. 

Suppose V is a two-dimensional vector space, T € £(V), and the matrix of 


T with respect to some basis of V is ( ; : ) 


(a) Show that T? — (a+ d)T + (ad — be)I = 0. 
(b) Show that the minimal polynomial of T equals 


zZ-—a ifb =c = Oanda = d, 
z2 —(a+d)z+(ad—bc) otherwise. 
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Define T € £(F") by T (x1, %9, 3, -05X,) = (X41, 2X, 3X3, ..., 1X,,). Find the 
minimal polynomial of T. 


Suppose T € £(V) andp € P(F). Prove that there exists a unique r € P(F) 
such that p(T) = r(T) and degr is less than the degree of the minimal 
polynomial of T. 


Suppose V is finite-dimensional and T € £(V) has minimal polynomial 
4+ 5z— 62? —7z° + 2z+ + 2°. Find the minimal polynomial of T~+. 


Suppose V is a finite-dimensional complex vector space with dim V > 0 
and T € £(V). Define f: C > R by 


f(A) = dim range(T — AI). 
Prove that fis not a continuous function. 


Suppose dg, ...,4,,_1 © F. Let T be the operator on F” whose matrix (with 
respect to the standard basis) is 


0 —Ag 
1 0 a, 
t= —ay 
ay 2 
Ay —1 


Here all entries of the matrix are 0 except for all 1’s on the line under the 
diagonal and the entries in the last column (some of which might also be 0). 
Show that the minimal polynomial of T is the polynomial 


Ag + AyZ+e +a, 422-142". 


The matrix above is called the companion matrix of the polynomial above. 
This exercise shows that every monic polynomial is the minimal polynomial 
of some operator. Hence a formula or an algorithm that could produce 
exact eigenvalues for each operator on each F" could then produce exact 
zeros for each polynomial [by 5.27(a)]. Thus there is no such formula or 
algorithm. However, efficient numeric methods exist for obtaining very good 
approximations for the eigenvalues of an operator. 


Suppose V is finite-dimensional, T € £(V), and p is the minimal polynomial 
of T. Suppose A € F. Show that the minimal polynomial of T — AI is the 
polynomial q defined by q(z) = p(z+ A). 


Suppose V is finite-dimensional, T € £(V), and p is the minimal polynomial 
of T. Suppose A € F\{0}. Show that the minimal polynomial of AT is the 


polynomial q defined by q(z) = A‘°2? p(=). 
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Suppose V is finite-dimensional and T € £(V). Let € be the subspace of 
£(V) defined by 
E = {q(T) ?q € P(F)}. 


Prove that dim € equals the degree of the minimal polynomial of T. 


Suppose T € £(F*) is such that the eigenvalues of T are 3,5, 8. Prove that 
(T — 31)?(T — 51)?(T — 81? = 0. 
Suppose V is finite-dimensional and T € £(V). Prove that the minimal 
polynomial of T has degree at most 1 + dim range T. 

If dimrange T < dim V — 1, then this exercise gives a better upper bound 


than 5.22 for the degree of the minimal polynomial of T. 


Suppose V is finite-dimensional and T € £(V). Prove that T is invertible if 
and only if I € span(T, T’,..., T%™”). 


Suppose V is finite-dimensional and T € £(V). Let n = dim V. Prove that 
if v € V, then span(v, Tv, ..., T”~!v) is invariant under T. 


Suppose V is a finite-dimensional complex vector space. Suppose T € Z(V) 
is such that 5 and 6 are eigenvalues of T and that T has no other eigenvalues. 
Prove that (T — 51)%™V-1(T — 6/)dmV-1 = 0, 


Suppose V is finite-dimensional, T € £(V), and U is a subspace of V that 
is invariant under T. 


(a) Prove that the minimal polynomial of T is a polynomial multiple of the 
minimal polynomial of the quotient operator T/U. 


(b) Prove that 
(minimal polynomial of T|,;) x (minimal polynomial of T/U) 
is a polynomial multiple of the minimal polynomial of T. 


The quotient operator T/U was defined in Exercise 38 in Section SA. 


Suppose V is finite-dimensional, T € £(V), and U is a subspace of V that 
is invariant under T. Prove that the set of eigenvalues of T equals the union 
of the set of eigenvalues of T|,; and the set of eigenvalues of T/U. 


Suppose F = R, V is finite-dimensional, and T € £(V). Prove that the 
minimal polynomial of T¢ equals the minimal polynomial of T. 


The complexification Tc was defined in Exercise 33 of Section 3B. 
Suppose V is finite-dimensional and T € £(V). Prove that the minimal 
polynomial of T’ € £(V’) equals the minimal polynomial of T. 

The dual map T' was defined in Section 3F. 


Show that every operator on a finite-dimensional vector space of dimension 
at least two has an invariant subspace of dimension two. 


Exercise 6 in Section 5C will give an improvement of this result when F = C. 
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SC Upper-Triangular Matrices 


In Chapter 3 we defined the matrix of a linear map from a finite-dimensional vector 
space to another finite-dimensional vector space. That matrix depends on a choice 
of basis of each of the two vector spaces. Now that we are studying operators, 
which map a vector space to itself, the emphasis is on using only one basis. 


5.35 definition: matrix of an operator, M(T) 


Suppose T € £(V). The matrix of T with respect to a basis v,,...,v,, of Vis 
the n-by-n matrix 


Ay 
M(T) = 
Ant 


whose entries A;,, are defined by 


TU, = Ay 40 ctemarcteecte ARO: 


The notation M(T, (vj, ...,U,,)) is used if the basis is not clear from the context. 


Operators have square matrices (meaning that the number of rows equals the 
number of columns), rather than the more general rectangular matrices that we 
considered earlier for linear maps. 

If T is an operator on F” and no ba- 
sis is specified, assume that the basis in 
question is the standard one (where the 
k basis vector is 1 in the k" slot and 0 
in all other slots). You can then think of 
the k column of M(T) as T applied to the k basis vector, where we identify 
n-by-1 column vectors with elements of F”. 


5.36 example: matrix of an operator with respect to standard basis 


Define T € L(F°) by T(x, y,Z) = (2x + y, 5y + 3z, 8z). Then the matrix of T 
with respect to the standard basis of F° is 


The k column of the matrix M(T) is 
formed from the coefficients used to 
write Tv, as a linear combination of 


the basis V,,...,0,- 


2 1 0 
M(T)=| 0 5 3 |, 
00 8 
as you should verify. 


A central goal of linear algebra is to show that given an operator T on a finite- 
dimensional vector space V, there exists a basis of V with respect to which T has 
a reasonably simple matrix. To make this vague formulation a bit more precise, 
we might try to choose a basis of V such that M(T) has many 0’s. 
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If V is a finite-dimensional complex vector space, then we already know 
enough to show that there is a basis of V with respect to which the matrix of T 
has 0’s everywhere in the first column, except possibly the first entry. In other 
words, there is a basis of V with respect to which the matrix of T looks like 


A 

0 * 

0 
here * denotes the entries in all columns other than the first column. To prove 
this, let A be an eigenvalue of T (one exists by 5.19) and let v be a corresponding 
eigenvector. Extend v to a basis of V. Then the matrix of T with respect to this 


basis has the form above. Soon we will see that we can choose a basis of V with 
respect to which the matrix of T has even more 0’s. 


For example, the diagonal of the matrix 


2 1 0 
ven =(0 5 3 | 
0 0 8 


from Example 5.36 consists of the entries 2,5,8, which are shown in red in the 
matrix above. 


For example, the 3-by-3 matrix above is upper triangular. 
Typically we represent an upper-triangular matrix in the form 


. | 
0 A, 


the 0 in the matrix above indicates that We often use * to denote matrix entries 
all entries below the diagonal in this thgz we do not know or that are irrele- 
n-by-n matrix equal 0. Upper-triangular — yanz to the questions being discussed. 
matrices can be considered reasonably 

simple—if n is large, then at least almost half the entries in an n-by-n upper- 
triangular matrix are 0. 


156 Chapter 5 Eigenvalues and Eigenvectors 


The next result provides a useful connection between upper-triangular matrices 
and invariant subspaces. 


5.39 conditions for upper-triangular matrix 


Suppose T € L(V) and %,...,v, is a basis of V. Then the following are 
equivalent. 


(a) The matrix of T with respect to v1, ...,v,, is upper triangular. 


(b) span(vy, ...,0;,) is invariant under T for each k = 1,...,n. 


(c) Tv, € span(v,,...,0;,) for each k = 1,...,n. 


Proof First suppose (a) holds. To prove that (b) holds, suppose k € {1,..., n}. If 
j€ {1,..., 7}, then 
To, € span(v,, +5 0;) 


because the matrix of T with respect to v,,...,v,, is upper triangular. Because 


span(v,,...,0;) C span(vj,...,0,) if j < k, we see that 
10; € span(V, ..., V) 


for eachj € {1,...,k}. Thus span(v,...,7,) is invariant under T, completing the 
proof that (a) implies (b). 

Now suppose (b) holds, so span(v,,...,0,) is invariant under T for each 
k =1,...,n. In particular, Tv, € span(v,,...,0,) for each k = 1,...,n. Thus 
(b) implies (c). 

Now suppose (c) holds, so Tv, € span(vj,...,0,) for each k = 1,...,n. This 
means that when writing each Tv, as a linear combination of the basis vectors 
V1, «+, U,, We need to use only the vectors vj, ...,v,. Hence all entries under the 
diagonal of (T) are 0. Thus M(T) is an upper-triangular matrix, completing 
the proof that (c) implies (a). 

We have shown that (a) == (b) = (c) = (a), which shows that (a), (b), 
and (c) are equivalent. 


The next result tells us that if T © £(V) and with respect to some basis of V 


we have 
Ay * 
Mt) = | : } 
0 X,, 


then T satisfies a simple equation depending on A,,...,A,,. 


5.40 equation satisfied by operator with upper-triangular matrix 


Suppose T € £(V) and V has a basis with respect to which T has an upper- 


triangular matrix with diagonal entries A,,..., A,,. Then 


T=] A= 0: 
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Proof Let v4,...,v,, denote a basis of V with respect to which T has an upper- 
triangular matrix with diagonal entries A,,...,A,,.. Then Tv; = A,v,, which 
means that (T — A,I)v, = 0, which implies that (T — A,J)---(T —A,,D)v, = 0 for 
m = 1,...,n (using the commutativity of each T — A,I with each T — A;J). 

Note that (T — A,I)vy € span(v,). Thus (T — A,I)(T — A,I)vy = 0 (by 
the previous paragraph), which implies that (T — A,/)---(T — A,,I)v, = 0 for 
m = 2,...,n (using the commutativity of each T — A,I with each T — A,J). 

Note that (T — A3I)v3 © span(v,,vz). Thus by the previous paragraph, 
(T-A,D (T-A,I)(T—A31)v3 = 0, which implies that (T—A,1I)---(T—A,,I)v, = 0 
for m = 3,...,n (using the commutativity of each T — Ajl with each T — A,J). 

Continuing this pattern, we see that (T — A,I)---(T — A,,Dv, = 0 for each 
k =1,...,n. Thus (T — A,I)---(T — A,,J) is the 0 operator because it is 0 on each 
vector in a basis of V. 


Unfortunately no method exists for exactly computing the eigenvalues of an 
operator from its matrix. However, if we are fortunate enough to find a basis with 
respect to which the matrix of the operator is upper triangular, then the problem 
of computing the eigenvalues becomes trivial, as the next result shows. 


5.41 determination of eigenvalues from upper-triangular matrix 


Suppose T € £(V) has an upper-triangular matrix with respect to some basis 


of V. Then the eigenvalues of T are precisely the entries on the diagonal of 
that upper-triangular matrix. 


Proof Suppose 7,,...,v,, is a basis of V with respect to which T has an upper- 


triangular matrix 
Ay * 
M(T) = | *, } 
0 ny 


Because Tv, = A,0;, we see that A, is an eigenvalue of T. 
Suppose k € {2,...,n}. Then (T — A, Du, € span(v,..., 0 _1). Thus T — A, 
maps span(v,,...,¥,) into span(v,,..., 0,1). Because 


dim span(v,,...,0,) =k and dimspan(7,...,%_1,) =k-1, 


this implies that T — A,J restricted to span(v,, ...,v;,) is not injective (by 3.22). 
Thus there exists v € span(v}, ...,0,) such that v # 0 and (T — A,I)v = 0. Thus 
A, is an eigenvalue of T. Hence we have shown that every entry on the diagonal 
of M(T) is an eigenvalue of T. 

To prove T has no other eigenvalues, let q be the polynomial defined by 
q(Z) = (Z— Ay): (Z — A,,). Then q(T) = 0 (by 5.40). Hence q is a polynomial 
multiple of the minimal polynomial of T (by 5.29). Thus every zero of the minimal 
polynomial of T is a zero of g. Because the zeros of the minimal polynomial of 
T are the eigenvalues of T (by 5.27), this implies that every eigenvalue of T is a 
zero of gq. Hence the eigenvalues of T are all contained in the list A,,..., A,,. 
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5.42 example: eigenvalues via an upper-triangular matrix 


Define T € L(F°) by T(x, y,z) = (2x + y, 5y + 3z, 8z). The matrix of T with 
respect to the standard basis is 


2 10 
M(T)=|] 0 5 3 J. 
0 0 8 
Now 5.41 implies that the eigenvalues of T are 2, 5, and 8. 
The next example illustrates 5.44: an operator has an upper-triangular matrix 


with respect to some basis if and only if the minimal polynomial of the operator 
is the product of polynomials of degree 1. 


5.43 example: whether T has an upper-triangular matrix can depend on F 


Define T € £(F*) by 
T (21, 29523,24) = (—29, 21,221 + 3Z3,Z3 + 3Z4). 


Thus with respect to the standard basis of F*, the matrix of T is 


0 -1 0 0 
1 0 0 0 
2 0 3 0 
0 0 1 3 


You can ask a computer to verify that the minimal polynomial of T is the polyno- 
mial p defined by 
p(z) =9 — 6z + 10z” — 623 +24, 
First consider the case F = R. Then the polynomial p factors as 
plz) = (2 + 1)@— 3)(@—3), 


with no further factorization of z* + 1 as the product of two polynomials of degree 
1 with real coefficients. Thus 5.44 states that there does not exist a basis of R* 
with respect to which T has an upper-triangular matrix. 

Now consider the case F = C. Then the polynomial p factors as 


pz) = (2-1)Z+)(zZ—3)Zz— 3), 
where all factors above have the form z—A,. Thus 5.44 states that there is a basis of 
C? with respect to which T has an upper-triangular matrix. Indeed, you can verify 
that with respect to the basis (4 — 3i, —-3 — 47, -3 + 1,1), (4+ 3i, -3+ 4i, -3 —i,1), 
(0,0, 0,1), (0,0,1,0) of C*, the operator T has the upper-triangular matrix 
i 0 0 0 
0 -i 0 0 
0 0 3 1 
0 0 0 3 
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5.44 necessary and sufficient condition to have an upper-triangular matrix 


Suppose V is finite-dimensional and T € L(V). Then T has an upper- 


triangular matrix with respect to some basis of V if and only if the minimal 
polynomial of T equals (z — A,)---(z — A,,) for some A, ...,A,, € F. 


Proof First suppose T has an upper-triangular matrix with respect to some basis 
of V. Let a1, ..., v,, denote the diagonal entries of that matrix. Define a polynomial 
q © P(F) by 

q(Z) = (2 — @y)(Z — Wy). 


Then q(T) = 0, by 5.40. Hence q is a polynomial multiple of the minimal polyno- 
mial of T, by 5.29. Thus the minimal polynomial of T equals (z — A,)---(z — A,,) 
for some A,,...,A,, € F with {Aq,..., Aj} C {0q, 05 Uy de 

To prove the implication in the other direction, now suppose the minimal 
polynomial of T equals (z — A,)---(z — A,,,) for some A4,...,A,,, € F. We will use 
induction on m. To get started, ifm = 1 then z — A, is the minimal polynomial of 
T, which implies that T = A,/, which implies that the matrix of T (with respect 
to any basis of V) is upper triangular. 

Now suppose m > 1 and the desired result holds for all smaller positive 
integers. Let 

U = range(T — A,,]). 


Then U is invariant under T [this is a special case of 5.18 with p(z) = z-— A,,]. 
Thus T|,; is an operator on U. 
Ifu € U, then u = (T — A,,])v for some v € V and 


(T — AWD=(T — ADU = (T —AqD)+(T —A_ D0 = 0. 


Hence (z — A,)++-(Z — A,,,_1) is a polynomial multiple of the minimal polynomial 
of T|,;, by 5.29. Thus the minimal polynomial of T|,; is the product of at most 
m — 1 terms of the form z — A,. 

By our induction hypothesis, there is a basis u/, ..., u,, of U with respect to 
which T|,, has an upper-triangular matrix. Thus for each k € {1,..., M}, we have 
(using 5.39) 


5.45 Tu, = (Ty) (uz) © span(uy, ..., Up). 


Extend 1, ..., Uyz to a basis uy, ..., Ugg, 01, «++» Uy Of V. For each k € {1,..., N}, 
we have 
TU, = (T _ Ant) Uz “+ Ain Dk 


The definition of U shows that (T — A,,J)v, € U = span(uy, ..., Uy,). Thus the 
equation above shows that 


5.46 Tv, © span(uy, ..., Ugg, 07, «+05 Ug) 


From 5.45 and 5.46, we conclude (using 5.39) that T has an upper-triangular 
matrix with respect to the basis uy, ..., Us, V1, -+-5 Uy Of V, as desired. 
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The set of numbers {A,,...,A,,,} from the previous result equals the set of 
eigenvalues of T (because the set of zeros of the minimal polynomial of T equals 
the set of eigenvalues of T, by 5.27), although the list A,,..., A,,, in the previous 
result may contain repetitions. 

In Chapter 8 we will improve even the wonderful result below; see 8.37 and 
8.46. 


5.47. if F =C , then every operator on V has an upper-triangular matrix 


Suppose V is a finite-dimensional complex vector space and T € £(V). Then 
T has an upper-triangular matrix with respect to some basis of V. 


Proof The desired result follows immediately from 5.44 and the second version 
of the fundamental theorem of algebra (see 4.13). 


For an extension of the result above to two operators S and T such that 
ST =TS, 


see 5.80. Also, for an extension to more than two operators, see Exercise 9(b) in 
Section SE. 

Caution: If an operator T € £(V) has a upper-triangular matrix with respect 
to some basis 71, ..., v,, of V, then the eigenvalues of T are exactly the entries on 
the diagonal of (T), as shown by 5.41, and furthermore v, is an eigenvector of 
T. However, v5, ...,v,, need not be eigenvectors of T. Indeed, a basis vector 7; is 
an eigenvector of T if and only if all entries in the k column of the matrix of T 
are 0, except possibly the k"" entry. 

You may recall from a previous The row echelon form of the matrix 
course that every matrix of numbers can of an operator does not give us a list 
be changed to a matrix in what is called oF the eigenvalues of the operator. In 
row echelon form. If one begins witha contrast, an upper-triangular matrix 
square matrix, the matrix inrow echelon with respect to some basis gives us a 
form will be an upper-triangular matrix. list of all the eigenvalues of the op- 
Do not confuse this upper-triangular ma- erator. However, there is no method 
trix with the upper-triangular matrix of for computing exactly such an upper- 
an operator with respect to some basis triangular matrix, even though 5.47 
whose existence is proclaimed by 5.47 (if guarantees its existence if F = C. 

F = C)—there is no connection between 
these upper-triangular matrices. 


Exercises 5C 


1 Prove or give a counterexample: If T € £(V) and T” has an upper-triangular 
matrix with respect to some basis of V, then T has an upper-triangular matrix 
with respect to some basis of V. 
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Suppose A and B are upper-triangular matrices of the same size, with 
W1,-..,a,, on the diagonal of A and f,,..., 6,, on the diagonal of B. 


(a) Show that A + B is an upper-triangular matrix with a, + fy,...,4a, + B, 
on the diagonal. 

(b) Show that AB is an upper-triangular matrix with a1), ...,a,,6,, on the 
diagonal. 


The results in this exercise are used in the proof of 5.81. 


Suppose T € L(V) is invertible and 7, ...,v,, is a basis of V with respect 
to which the matrix of T is upper triangular, with A,,...,A,, on the diagonal. 
Show that the matrix of T~! is also upper triangular with respect to the basis 
V1, +5 0,, With 


on the diagonal. 


Give an example of an operator whose matrix with respect to some basis 
contains only 0’s on the diagonal, but the operator is invertible. 


This exercise and the exercise below show that 5.41 fails without the hypoth- 
esis that an upper-triangular matrix is under consideration. 


Give an example of an operator whose matrix with respect to some basis 
contains only nonzero numbers on the diagonal, but the operator is not 
invertible. 


Suppose F = C, V is finite-dimensional, and T € £(V). Prove that if 
k € {1,...,dim V}, then V has a k-dimensional subspace invariant under T. 


Suppose V is finite-dimensional, T € £(V), andv € V. 


(a) Prove that there exists a unique monic polynomial p,, of smallest degree 
such that p,(T)v = 0. 
(b) Prove that the minimal polynomial of T is a polynomial multiple of p,. 


Suppose V is finite-dimensional, T € L(V), and there exists a nonzero 
vector v € V such that T2v + 2Tv = —2v. 


(a) Prove that if F = R, then there does not exist a basis of V with respect 
to which T has an upper-triangular matrix. 

(b) Prove that if F = C and A is an upper-triangular matrix that equals 
the matrix of T with respect to some basis of V, then —1 + i or —1 —i 
appears on the diagonal of A. 


Suppose B is a square matrix with complex entries. Prove that there exists 
an invertible square matrix A with complex entries such that A~'BA is an 
upper-triangular matrix. 
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Suppose T € L(V) and 7,...,v,, is a basis of V. Show that the following 
are equivalent. 

(a) The matrix of T with respect to vj,...,v,, is lower triangular. 

(b) span(v,,...,0,,) is invariant under T for each k = 1,..., 7. 

(c) Tu, € span(%,...,v,,) for each k = 1,..., 1. 


A square matrix is called lower triangular if all entries above the diagonal 
are 0. 


Suppose F = C and V is finite-dimensional. Prove that if T € £(V), then 
there exists a basis of V with respect to which T has a lower-triangular matrix. 


Suppose V is finite-dimensional, T € £(V) has an upper-triangular matrix 
with respect to some basis of V, and U is a subspace of V that is invariant 
under T. 


(a) Prove that T|,, has an upper-triangular matrix with respect to some basis 
of U. 

(b) Prove that the quotient operator T/U has an upper-triangular matrix with 
respect to some basis of V/U. 


The quotient operator T/U was defined in Exercise 38 in Section SA. 


Suppose V is finite-dimensional and T € L(V). Suppose there exists 
a subspace U of V that is invariant under T such that T|,; has an upper- 
triangular matrix with respect to some basis of U and also T/U has an 
upper-triangular matrix with respect to some basis of V/U. Prove that T has 
an upper-triangular matrix with respect to some basis of V. 


Suppose V is finite-dimensional and T € £(V). Prove that T has an upper- 
triangular matrix with respect to some basis of V if and only if the dual 
operator T’ has an upper-triangular matrix with respect to some basis of the 
dual space V’, 
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Diagonal Matrices 


A diagonal matrix is a square matrix that is 0 everywhere except possibly on 
the diagonal. 


is a diagonal matrix. 


If an operator has a diagonal matrix —_p, ery diagonal matrix is upper tri- 
with respect to some basis, then the en- — gygylar, Diagonal matrices typically 
tries on the diagonal are precisely the — ~gye many more 0’s than most upper- 
eigenvalues of the operator; this follows triangular matrices of the same size. 
from 5.41 (or find an easier direct proof 
for diagonal matrices). 


An operator on V is called diagonalizable if the operator has a diagonal matrix 
with respect to some basis of V. 


| 5.51 example: diagonalization may require a different basis 


Define T € £(R?) by 
T(x,y) = (41x + 7y, —20x + 74y). 
The matrix of T with respect to the standard basis of R? is 
41 7 
—20 74 /’ 
which is not a diagonal matrix. However, T is diagonalizable. Specifically, the 
matrix of T with respect to the basis (1, 4), (7,5) is 


69 0 
0 46 


because T(1,4) = (69,276) = 69(1,4) and T(7,5) = (322, 230) = 46(7,5). 
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For A € F, we will find it convenient to have a name and a notation for the set 
of vectors that an operator T maps to A times the vector. 


5.52 definition: eigenspace, E(A, T) 


Suppose T € £(V) and A € F. The eigenspace of T corresponding to A is 
the subspace E(A, T) of V defined by 


E(A,T) = null(T — Al) = {0 EV: Tv = Ad}. 


Hence E(A, T) is the set of all eigenvectors of T corresponding to A, along 
with the 0 vector. 


For T € £(V) and A € F, the set E(A, T) is a subspace of V because the null 
space of each linear map on V is a subspace of V. The definitions imply that A is 
an eigenvalue of T if and only if E(A, T) # {0}. 


5.53 example: eigenspaces of an operator 


Suppose the matrix of an operator T € £(V) with respect to a basis v1, v2, 03 
of V is the matrix in Example 5.49. Then 


E(8,T) = span(v,), E(5,T) = span(v>, v3). 


If A is an eigenvalue of an operator T € £(V), then T restricted to E(A, T) is 
just the operator of multiplication by A. 


5.54 sum of eigenspaces is a direct sum 


Suppose T € £(V) and A,,..., A,,, are distinct eigenvalues of T. Then 
E(A,, T) stom cient IEA T) 


is a direct sum. Furthermore, if V is finite-dimensional, then 


dim E(A,,T) +--+ + dim E(A,,,T) < dim V. 


Proof To show that E(A,,T) +--+ + E(A,,,, T) is a direct sum, suppose 
Vy t-++9,, = 0, 


where each x, is in E(A,,T). Because eigenvectors corresponding to distinct 
eigenvalues are linearly independent (by 5.11), this implies that each v, equals 0. 
Thus E(A,,T) + --- + E(A,,,, T) is a direct sum (by 1.45), as desired. 

Now suppose V is finite-dimensional. Then 


dim E(A,,T) +--+ + dim E(A,,, T) = dim(E(A,,T) ® --- ® E(A,,, T)) 
< dim V, 


where the first line follows from 3.94 and the second line follows from 2.37. 


Section 5D Diagonalizable Operators 165 


Conditions for Diagonalizability 


The following characterizations of diagonalizable operators will be useful. 


5.55 conditions equivalent to diagonalizability 


Suppose V is finite-dimensional and T € £(V). Let Aj,...,A,,, denote the 
distinct eigenvalues of T. Then the following are equivalent. 


(a) Tis diagonalizable. 


(b) V has a basis consisting of eigenvectors of T. 
(c) V = FEXAao )) ® cers ® TE Are I Vo 
(d) dimV = dimE(A,,T) +--+ dim E(A,,, T). 


Proof An operator T € £(V) has a diagonal matrix 


| : 
0 A, 


with respect to a basis 7,...,v, of Vif and only if To, = A,v, for each k. Thus 
(a) and (b) are equivalent. 

Suppose (b) holds; thus V has a basis consisting of eigenvectors of T. Hence 
every vector in V is a linear combination of eigenvectors of T, which implies that 


V SEQ, T) + +E, T)- 


Now 5.54 shows that (c) holds, proving that (b) implies (c). 
That (c) implies (d) follows immediately from 3.94. 
Finally, suppose (d) holds; thus 


5.56 dim V = dim E(A,, T) + --- + dim E(A,,,, T). 


Choose a basis of each E(A,, T); put all these bases together to form a list v1, ..., 0, 
of eigenvectors of T, where n = dim V (by 5.56). To show that this list is linearly 
independent, suppose 

4,0, +++ +4,0, = 0, 
where 44,...,4,, © F. For each k = 1,...,m, let u, denote the sum of all the terms 
AV; such that a= E(A,, T). Thus each u;, is in E(A,, T), and 


Uy te-tu,, =0. 


Because eigenvectors corresponding to distinct eigenvalues are linearly indepen- 
dent (see 5.11), this implies that each u, equals 0. Because each u; is a sum of 
terms 4;0;, where the 0;’S were chosen to be a basis of E(A,, T), this implies that 
all a;’s equal 0. Thus 7, ...,v,, is linearly independent and hence is a basis of V 
(by 2.38). Thus (d) implies (b), completing the proof. 


For additional conditions equivalent to diagonalizability, see 5.62, Exercises 5 
and 15 in this section, Exercise 24 in Section 7B, and Exercise 15 in Section 8A. 
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As we know, every operator on a finite-dimensional complex vector space 
has an eigenvalue. However, not every operator on a finite-dimensional complex 
vector space has enough eigenvectors to be diagonalizable, as shown by the next 
example. 


5.57 example: an operator that is not diagonalizable 


Define an operator T € L(F°) by T (a,b,c) = (b,c, 0). The matrix of T with 
respect to the standard basis of F° is 


0 1 0 
0 0 1 |, 
0 0 0 


which is an upper-triangular matrix but is not a diagonal matrix. 
As you should verify, 0 is the only eigenvalue of T and furthermore 


E(0,T) = {(a,0,0) € F? :4 € F}. 


Hence conditions (b), (c), and (d) of 5.55 fail (of course, because these conditions 
are equivalent, it is sufficient to check that only one of them fails). Thus condition 
(a) of 5.55 also fails. Hence T is not diagonalizable, regardless of whether F = R 
orF=C. 


The next result shows that if an operator has as many distinct eigenvalues as 
the dimension of its domain, then the operator is diagonalizable. 


5.58 enough eigenvalues implies diagonalizability 


Suppose V is finite-dimensional and T € £(V) has dim V distinct eigenvalues. 
Then T is diagonalizable. 


Proof Suppose T has distinct eigenvalues A,,..., Agimy- For each k, let v, € V 
be an eigenvector corresponding to the eigenvalue A,. Because eigenvectors corre- 
sponding to distinct eigenvalues are linearly independent (see 5.11), 04, ..., Ugimy 
is linearly independent. 

A linearly independent list of dim V vectors in V is a basis of V (see 2.38); thus 
V1, +++5 Ugimy iS a basis of V. With respect to this basis consisting of eigenvectors, 
T has a diagonal matrix. 


In later chapters we will find additional conditions that imply that certain 
operators are diagonalizable. For example, see the real spectral theorem (7.29) 
and the complex spectral theorem (7.31). 

The result above gives a sufficient condition for an operator to be diagonal- 
izable. However, this condition is not necessary. For example, the operator T 
on F® defined by T(x, y,Z) = (6x, 6y, 7z) has only two eigenvalues (6 and 7) and 
dim F° = 3, but T is diagonalizable (by the standard basis of F*). 
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The next example illustrates the im- 
portance of diagonalization, which can techniques, see Exercise 21, which 
be used to compute high powers of an sows how to use diagonalization to 
operator, taking advantage of the equa- find an exact formula for the n" term 
tion T*v = A*v if v is an eigenvector of of the Fibonacci sequence. 

T with eigenvalue A. 


5.59 example: using diagonalization to compute T'°° 


Define T € L(F°) by T(x,y,Z) = (2x + y, 5y + 3z,8z). With respect to the 
standard basis, the matrix of T is 


2 1 0 
05 3 |}. 
0 0 8 


The matrix above is an upper-triangular matrix but it is not a diagonal matrix. By 
5.41, the eigenvalues of T are 2, 5, and 8. Because T is an operator on a vector 
space of dimension three and T has three distinct eigenvalues, 5.58 assures us that 
there exists a basis of F° with respect to which T has a diagonal matrix. 

To find this basis, we only have to find an eigenvector for each eigenvalue. In 
other words, we have to find a nonzero solution to the equation 


For a spectacular application of these 


T(x, y,Z) = A(x, y,Z) 


for A = 2, then for A = 5, and then for A = 8. Solving these simple equations 
shows that for A = 2 we have an eigenvector (1,0,0), for A = 5 we have an 
eigenvector (1,3, 0), and for A = 8 we have an eigenvector (1, 6, 6). 

Thus (1, 0,0), (1,3, 0), (1,6, 6) is a basis of F* consisting of eigenvectors of T, 
and with respect to this basis the matrix of T is the diagonal matrix 


2 0 0 
05 0 }. 
0 0 8 


To compute T!” (0,0, 1), for example, write (0,0, 1) as a linear combination 
of our basis of eigenvectors: 


(0,0,1) = £(1,0,0) — $(1,3,0) + 2(1,6, 6). 
Now apply T!° to both sides of the equation above, getting 
710 (0,0,1) = 5(T1(1,0,0)) — $(T10(1,3,0)) + 2(T200(1, 6, 6)) 
= £(210°(1,0,0) — 2 - 5!(1,3, 0) + 810(1, 6, 6) ) 


~ aa — 2.5100 + 100, 6 - 3100 _ 6. 5100. Bs gi00), 
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We saw earlier that an operator T on a finite-dimensional vector space V has an 
upper-triangular matrix with respect to some basis of V if and only if the minimal 
polynomial of T equals (z — A,)---(z — A,,) for some Aj,..., A, € F (see 5.44). 
As we previously noted (see 5.47), this condition is always satisfied if F = C. 

Our next result 5.62 states that an operator T € £(V) has a diagonal matrix 
with respect to some basis of V if and only if the minimal polynomial of T equals 
(z— Ay)+-(Z—A,,) for some distinct A,,...,A,, € F. Before formally stating this 
result, we give two examples of using it. 


5.60 example: diagonalizable, but with no known exact eigenvalues 


Define T € £(C°) by 
T (21, 29,23, Z4, 25) = (—3Z5, 24 + 625, Zo, 23, Z4)- 


The matrix of T is shown in Example 5.26, where we showed that the minimal 
polynomial of T is 3 — 6z + 2°. 

As mentioned in Example 5.28, no exact expression is known for any of the 
zeros of this polynomial, but numeric techniques show that the zeros of this 
polynomial are approximately —1.67, 0.51, 1.40, —0.12 + 1.597, —0.12 — 1.597. 

The software that produces these approximations is accurate to more than 
three digits. Thus these approximations are good enough to show that the five 
numbers above are distinct. The minimal polynomial of T equals the fifth degree 
monic polynomial with these zeros. Now 5.62 shows that T is diagonalizable. 


5.61 example: showing that an operator is not diagonalizable 


Define T € £(F°) by 


The matrix of T with respect to the standard basis of F° is 


6 3 4 
0 6 2 |. 
0 0 7 


The matrix above is an upper-triangular matrix but is not a diagonal matrix. Might 
T have a diagonal matrix with respect to some other basis of F°? 

To answer this question, we will find the minimal polynomial of T. First note 
that the eigenvalues of T are the diagonal entries of the matrix above (by 5.41). 
Thus the zeros of the minimal polynomial of T are 6,7 [by 5.27(a)]. The diagonal 
of the matrix above tells us that (T — 6I)*(T — 71) = 0 (by 5.40). The minimal 
polynomial of T has degree at most 3 (by 5.22). Putting all this together, we see 
that the minimal polynomial of T is either (z — 6)(z — 7) or (z — 6)7(z —7). 

A simple computation shows that (T — 61)(T — 7I) # 0. Thus the minimal 
polynomial of T is (z — 6)?(z — 7). 

Now 5.62 shows that T is not diagonalizable. 
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5.62 necessary and sufficient condition for diagonalizability 


Suppose V is finite-dimensional and T € £(V). Then T is diagonalizable if 


and only if the minimal polynomial of T equals (z — A,)---(z — A,,,) for some 
list of distinct numbers Aj,..., A,,, € F. 


Proof First suppose T is diagonalizable. Thus there is a basis v1,...,v,, of V 
consisting of eigenvectors of T. Let Aj,...,A,,, be the distinct eigenvalues of T. 
Then for each v,, there exists A, with (T — A,I)v; = 0. Thus 


(T — AqD--(T — Aggl)0; = 0, 


which implies that the minimal polynomial of T equals (z — A,)---(z — A,,). 

To prove the implication in the other direction, now suppose the minimal 
polynomial of T equals (z — A,)---(z — A,,) for some list of distinct numbers 
Ay,+5A, € F. Thus 


5.63 TAHA. So. 


We will prove that T is diagonalizable by induction on m. To get started, 
suppose m = 1. Then T — A,I = 0, which means that T is a scalar multiple of the 
identity operator, which implies that T is diagonalizable. 

Now suppose that m > 1 and the desired result holds for all smaller values of 
m. The subspace range(T — A,,,I) is invariant under T [this is a special case of 
5.18 with p(z) = z— A,,]. Thus T restricted to range(T — A,,,I) is an operator on 
range(T — A,,I). 

Ifu € range(T —A,,,I), then u = (T —A,,1)v for some v € V, and 5.63 implies 


5.64 T=ADT =A, ua =A, D7 =A, Do =0. 


Hence (z — Aj)++-(Z — A,,,_1) is a polynomial multiple of the minimal polynomial 
of T restricted to range(T — A,,,I) [by 5.29]. Thus by our induction hypothesis, 
there is a basis of range(T — A,,J) consisting of eigenvectors of T. 

Suppose that u € range(T — A,,J) nN null(T — A,,J). Then Tu = A,,u. Now 
5.64 implies that 


O= (T—A,D-(T—Ay_ Du 
= Onn = Ay) (An = Amu. 


Because Aj,...,A,,, are distinct, the equation above implies that u = 0. Hence 
range(T — A,,J) A null(T — A,,D) = {0}. 

Thus range(T—A,,,J)+null(T—A,,,J) is a direct sum (by 1.46) whose dimension 
is dim V (by 3.94 and 3.21). Hence range(T — A,,,J) ® null(T — A,,,]) = V. Every 
vector in null(T — A,,,J) is an eigenvector of T with eigenvalue A,,,. Earlier in this 
proof we saw that there is a basis of range(T — A,,,J) consisting of eigenvectors 
of T. Adjoining to that basis a basis of null(T — A,,,J) gives a basis of V consisting 
of eigenvectors of T. The matrix of T with respect to this basis is a diagonal 
matrix, as desired. 
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No formula exists for the zeros of polynomials of degree 5 or greater. However, 
the previous result can be used to determine whether an operator on a complex 
vector space is diagonalizable without even finding approximations of the zeros 
of the minimal polynomial—see Exercise 15. 

The next result will be a key tool when we prove a result about the simul- 
taneous diagonalization of two operators; see 5.76. Note how the use of the 
characterization of diagonalizable operators in terms of the minimal polynomial 
(see 5.62) leads to a short proof of the next result. 


5.65 restriction of diagonalizable operator to invariant subspace 


Suppose T € £(V) is diagonalizable and U is a subspace of V that is invariant 
under T. Then T|,; is a diagonalizable operator on U. 


Proof Because the operator T is diagonalizable, the minimal polynomial of T 
equals (z — A,)---(z — A,,) for some list of distinct numbers Aj, ..., A, € F (by 
5.62). The minimal polynomial of T is a polynomial multiple of the minimal 
polynomial of T|,, (by 5.31). Hence the minimal polynomial of T|,; has the form 
required by 5.62, which shows that T|,,; is diagonalizable. 


Gershgorin Disk Theorem 


5.66 definition: Gershgorin disks 


Suppose T € L(V) and 7,...,v,, is a basis of V. Let A denote the matrix of 
T with respect to this basis. A Gershgorin disk of T with respect to the basis 
V1, +++, U, is a set of the form 


fz er 7-4 (= oe Auth 
ea 


k#j 


where j € {1,..., 


Because there are n choices for j in the definition above, T has n Gershgorin 
disks. If F = C, then for each j € {1,...,}, the eoreespondie Gershgorin disk 
is a closed disk in C centered at A; ;, which i is the j" entry on the diagonal of A. 
The radius of this closed disk is the sum of the absolute values of the entries in 
row j of A, excluding the diagonal entry. If F = R, then the Gershgorin disks are 
closed intervals in R. 

In the special case that the square matrix A above is a diagonal matrix, each 
Gershgorin disk consists of a single point that is a diagonal entry of A (and 
each eigenvalue of T is one of those points, as required by the next result). One 
consequence of our next result is that if the nondiagonal entries of A are small, 
then each eigenvalue of T is near a diagonal entry of A. 
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5.67 Gershgorin disk theorem 


Suppose T € £(V) and 7,..., v,, is a basis of V. Then each eigenvalue of T 
is contained in some Gershgorin disk of T with respect to the basis 7, ..., V,. 


Proof Suppose A € F is an eigenvalue of T. Let w € V be a corresponding 
eigenvector. There exist c,,...,c,, € F such that 


5.68 W = Cy0, + + C,0,.- 


Let A denote the matrix of T with respect to the basis 7, ..., v,,. Applying T 
to both sides of the equation above gives 


5.69 Aw = » Cyl Up 
k=1 
n n 
= Dice Ap 
k=1 j=1 
n n 
5.70 = ( >. Ay x64 Jp 
j=1\k=1 


Let j € {1,...,1} be such that 
Icj| = max{|cy|, .-., C,|}- 


Using 5.68, we see that the coefficient of v; on the left side of 5.69 equals Ac,, 
which must equal the coefficient of v; on the right side of 5.70. In other words, 


nN 
Ac = ” Aj.x Che 
c= 


Subtract A; ;c; from each side of the equation above and then divide both sides 


by c; to get 


n 
Ck 
W=-Al=| 0 Ae 
k=1 j 

k#j 


IA 
M 
= 


Thus A is in the j“" Gershgorin disk with respect to the basis 7, ...,0,,- 


Exercise 22 gives a nice application 7p, Gershgorin disk theorem is named 


of the Gershgorin disk theorem. for Semyon Aronovich Gershgorin, 
Exercise 23 states that the radius of — ho published this result in 1931. 


each Gershgorin disk could be changed 
to the sum of the absolute values of corresponding column entries (instead of row 
entries), excluding the diagonal entry, and the theorem above would still hold. 
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Exercises 5D 


10 


Suppose V is a finite-dimensional complex vector space and T € £(V). 


(a) Prove that if T+ = I, then T is diagonalizable. 

(b) Prove that if T* = T, then T is diagonalizable. 

(c) Give an example of an operator T € £(C7) such that T* = T? and T is 
not diagonalizable. 


Suppose T € £(V) has a diagonal matrix A with respect to some basis 
of V. Prove that if A € F, then A appears on the diagonal of A precisely 
dim E(A, T) times. 


Suppose V is finite-dimensional and T € £(V). Prove that if the operator T 
is diagonalizable, then V = null T @ range T. 


Suppose V is finite-dimensional and T € £(V). Prove that the following 
are equivalent. 

(a) V = nullT @ range T. 

(b) V = null T + range T. 

(c) null T M range T = {0}. 


Suppose V is a finite-dimensional complex vector space and T € £(V). 
Prove that T is diagonalizable if and only if 


V = null(T — Al) @ range(T — AI) 
for every A € C. 


Suppose T € £(F°) and dimE(8,T) = 4. Prove that T — 2I or T — 61 is 
invertible. 


Suppose T € £(V) is invertible. Prove that 
ECT) S44.) 
for every A € F with A # 0. 


Suppose V is finite-dimensional and T € £(V). Let Aq, ..., A,,, denote the 
distinct nonzero eigenvalues of T. Prove that 


dim E(A,,T) + --- + dim E(A,,,, T) < dimrange T. 


Suppose R,T € L(F°) each have 2, 6, 7 as eigenvalues. Prove that there 
exists an invertible operator S € C(F°) such that R = S"!TS. 


Find R, T € £(F*) such that R and T each have 2, 6, 7 as eigenvalues, R and 
T have no other eigenvalues, and there does not exist an invertible operator 
S € £(F*) such that R = S“!TS. 


11 


12 


13 


14 


15 


16 


17 


18 
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Find T € L(C?) such that 6 and 7 are eigenvalues of T and such that T does 
not have a diagonal matrix with respect to any basis of C3 


Suppose T € £(C?) is such that 6 and 7 are eigenvalues of T. Furthermore, 
suppose T does not have a diagonal matrix with respect to any basis of C® 
Prove that there exists (Z,,Z,23) € C? such that 


T (21,29,23) = (6 + 82,,7 + 825,13 + 823). 


Suppose A is a diagonal matrix with distinct entries on the diagonal and B 
is a matrix of the same size as A. Show that AB = BA if and only if Bisa 
diagonal matrix. 


(a) Give an example of a finite-dimensional complex vector space and an 
operator T on that vector space such that T? is diagonalizable but T is 
not diagonalizable. 

(b) Suppose F = C, kis a positive integer, and T € L(V) is invertible. 
Prove that T is diagonalizable if and only if T* is diagonalizable. 


Suppose V is a finite-dimensional complex vector space, T € £(V), and p 

is the minimal polynomial of T. Prove that the following are equivalent. 

(a) T is diagonalizable. 

(b) There does not exist A € C such that p is a polynomial multiple of 
(z — A)’. 

(c) pand its derivative p’ have no zeros in common. 

(d) The greatest common divisor of p and p’ is the constant polynomial 1. 


The greatest common divisor of p and p' is the monic polynomial q of 
largest degree such that p and p' are both polynomial multiples of q. The 
Euclidean algorithm for polynomials (look it up) can quickly determine 
the greatest common divisor of two polynomials, without requiring any 
information about the zeros of the polynomials. Thus the equivalence of (a) 
and (d) above shows that we can determine whether T is diagonalizable 
without knowing anything about the zeros of p. 


Suppose that T € L(V) is diagonalizable. Let Aj, ..., A,,, denote the distinct 
eigenvalues of T. Prove that a subspace U of V is invariant under T if and 
only if there exist subspaces Uj,...,U,,, of V such that U, C E(A,,T) for 
eachk and U =U, @-::- @U,,. 


Suppose V is finite-dimensional. Prove that £(V) has a basis consisting of 
diagonalizable operators. 


Suppose that T € £(V) is diagonalizable and U is a subspace of V that is 
invariant under T. Prove that the quotient operator T/U is a diagonalizable 
operator on V/U. 


The quotient operator T/U was defined in Exercise 38 in Section 5A. 
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Prove or give a counterexample: If T € £(V) and there exists a subspace U 
of V that is invariant under T such that T|,; and T/U are both diagonalizable, 
then T is diagonalizable. 

See Exercise 13 in Section 5C for an analogous statement about upper- 


triangular matrices. 


Suppose V is finite-dimensional and T € £(V). Prove that T is diagonaliz- 
able if and only if the dual operator T’ is diagonalizable. 


The Fibonacci sequence Fy, F,,F5,... is defined by 
Fi, =0, F; =1, andF, =F,_,+F,_, forn > 2. 


Define T € £(R7) by T(x, y) = (y,x +). 


(a) Show that T”(0,1) = (F,,,F,,,,) for each nonnegative integer n. 
(b) Find the eigenvalues of T. 

(c) Find a basis of R? consisting of eigenvectors of T. 

(d) Use the solution to (c) to compute T” (0, 1). Conclude that 


me al(") -(S°)] 


for each nonnegative integer n. 
(e) Use (d) to conclude that if n is a nonnegative integer, then the Fibonacci 
number F,, is the integer that is closest to 


1 (: +75 ) 
ve\ 2 J 
Each F,, is a nonnegative integer, even though the right side of the formula 
in (d) does not look like an integer. The number 


1475 
a 


is called the golden ratio. 


Suppose T € £(V) and A is an n-by-n matrix that is the matrix of T with 
respect to some basis of V. Prove that if 


n 
IA; | > a IA; 


for each j € {1,...,n}, then T is invertible. 


This exercise states that if the diagonal entries of the matrix of T are large 
compared to the nondiagonal entries, then T is invertible. 


Suppose the definition of the Gershgorin disks is changed so that the radius of 
the k' disk is the sum of the absolute values of the entries in column (instead 
of row) k of A, excluding the diagonal entry. Show that the Gershgorin disk 
theorem (5.67) still holds with this changed definition. 
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SE Commuting Operators 


5.71. definition: commute 


e Two operators S and T on the same vector space commute if ST = TS. 


e Two square matrices A and B of the same size commute if AB = BA. 


For example, if T is an operator and p,q € P(F), then p(T) and q(T) commute 
[see 5.17(b)]. 

As another example, if I is the identity operator on V, then J commutes with 
every operator on V. 


5.72 example: partial differentiation operators commute 


Suppose m is a nonnegative integer. Let P,,(R7) denote the real vector space of 
polynomials (with real coefficients) in two real variables and of degree at most m, 
with the usual operations of addition and scalar multiplication of real-valued 
functions. Thus the elements of P,,(R*) are functions p on R? of the form 


5.73 Pp — >, a ly’ 
jtk<m 
where the indices j and k take on all nonnegative integer values such that +k < m, 
each j,k is in R, and xly* denotes the function on R* defined by (x, ye xJ We 
Define operators D,,D, € £(P,,(R*)) by 


Dyp= >= ¥ ja; .x'~ty* and ST ae a y ka; ly |, 
; j+k< 


where p is as in 5.73. The operators D, and D,, are called partial differentiation 
operators because each of these operators differentiates with respect to one of the 
variables while pretending that the other variable is a constant. 

The operators D, and D, commute because if p is as in 5.73, then 


(DD, p=. >). jee oy! = DD yp. 


jtk<m 


The equation D,D, = D,D, on P,  ,(R*) illustrates a more general result that 
the order of partial differentiation does not matter for nice functions. 


Commuting matrices are unusual. 
For example, there are 214,358,881 pene pairs of the 2-by-2 matrices under con- 
of 2-by-2 matrices all of whose entries sideration were checked by a computer 
are integers in the interval [—5,5]. Only 4g discover that only 674,609 of these 
about 0.3% of these pairs of matrices pairs of matrices commute. 
commute. 


All 214,358,881 (which equals 11°) 
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The next result shows that two operators commute if and only if their matrices 
(with respect to the same basis) commute. 


5.74 commuting operators correspond to commuting matrices 


Suppose S,T € £(V) and 7,...,v,, is a basis of V. Then S and T commute if 
and only if M(S, (v,...,V,,)) and M(T, (v;,...,0,,)) commute. 


Proof We have 
Sand T commute —= ST=TS 
= M(ST) = M(TS) 
<= M(S)M(T) = M(T)M(S) 
<= M(S) and M(T) commute, 


as desired. 


The next result shows that if two operators commute, then every eigenspace 
for one operator is invariant under the other operator. This result, which we will 
use several times, is one of the main reasons why a pair of commuting operators 
behaves better than a pair of operators that does not commute. 


5.75 eigenspace is invariant under commuting operator 


Suppose S, T € £(V) commute and A € F. Then E(A, S) is invariant under T. 


Proof Suppose v € E(A,S). Then 
S(Tv) = (ST)v = (TS)v = T(Sv) = T(Av) = ATv. 


The equation above shows that Tv € E(A,S). Thus E(A, S) is invariant under T. 


Suppose we have two operators, each of which is diagonalizable. If we want 
to do computations involving both operators (for example, involving their sum), 
then we want the two operators to be diagonalizable by the same basis, which 
according to the next result is possible when the two operators commute. 


5.76 simultaneous diagonalizablity <—» commutativity 


Two diagonalizable operators on the same vector space have diagonal matrices 
with respect to the same basis if and only if the two operators commute. 


Proof First suppose S,T € £(V) have diagonal matrices with respect to the 
same basis. The product of two diagonal matrices of the same size is the diagonal 
matrix obtained by multiplying the corresponding elements of the two diagonals. 
Thus any two diagonal matrices of the same size commute. Thus S and T commute, 
by 5.74. 
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To prove the implication in the other direction, now suppose that S,T € £(V) 
are diagonalizable operators that commute. Let Aj,...,A,,, denote the distinct 
eigenvalues of S. Because S is diagonalizable, 5.55(c) shows that 


5.77 V =E(Ay, S) © ® EAs S). 


For each k = 1,...,m, the subspace E(A;,S) is invariant under T (by 5.75). 
Because T is diagonalizable, 5.65 implies that T|,,,,5) is diagonalizable for 
each k. Hence for each k = 1,...,m, there is a basis of E(A,;,S) consisting of 
eigenvectors of T. Putting these bases together gives a basis of V (because of 
5.77), with each vector in this basis being an eigenvector of both S and T. Thus S 
and T both have diagonal matrices with respect to this basis, as desired. 


See Exercise 2 for an extension of the result above to more than two operators. 
Suppose V is a finite-dimensional nonzero complex vector space. Then every 
operator on V has an eigenvector (see 5.19). The next result shows that if two 
operators on V commute, then there is a vector in V that is an eigenvector for both 
operators (but the two commuting operators might not have a common eigenvalue). 
For an extension of the next result to more than two operators, see Exercise 9(a). 


5.78 common eigenvector for commuting operators 


Every pair of commuting operators on a finite-dimensional nonzero complex 
vector space has a common eigenvector. 


Proof Suppose V is a finite-dimensional nonzero complex vector space and 
S,T € £(V) commute. Let A be an eigenvalue of S (5.19 tells us that S does 
indeed have an eigenvalue). Thus E(A,S) # {0}. Also, E(A,S) is invariant 
under T (by 5.75). 

Thus T|-,,,5) has an eigenvector (again using 5.19), which is an eigenvector 
for both S and T, completing the proof. 


5.79 example: common eigenvector for partial differentiation operators 


Let P,,(R*) be as in Example 5.72 and let D,,D, € L(Pn(R?)) be the 
commuting partial differentiation operators in that example. As you can verify, 0 
is the only eigenvalue of each of these operators. Also 


E(0, D,.) = {y ayy* Ags Ain E R}, 
k=0 


m 
E(0,D,) = {> Ci) FLgy snes Cy, R}. 
j=0 


The intersection of these two eigenspaces is the set of common eigenvectors of 
the two operators. Because E(0, D,.) N E(0, D,) is the set of constant functions, 
we see that D,. and D,, indeed have a common eigenvector, as promised by 5.78. 
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The next result extends 5.47 (the existence of a basis that gives an upper- 
triangular matrix) to two commuting operators. 


5.80 commuting operators are simultaneously upper triangularizable 


Suppose V is a finite-dimensional complex vector space and S,T are 


commuting operators on V. Then there is a basis of V with respect to which 
both S and T have upper-triangular matrices. 


Proof Let n = dim V. We will use induction on n. The desired result holds if 
n = 1 because all 1-by-1 matrices are upper triangular. Now suppose n > 1 and 
the desired result holds for all complex vector spaces whose dimension is n — 1. 
Let v, be any common eigenvector of S and T (using 5.78). Hence Sv, € 
span(v,) and Tv, € span(v,). Let W be a subspace of V such that 
V = span(v,) @ W; 
see 2.33 for the existence of W. Define a linear map P: V > W by 
Pav, +w) =w 
for each a € C and each w € W. Define ca Te £(W) by 
Sw = P(Sw) and Tw = P(Tw) 


for each w € W. To apply our induction hypothesis to S and T, we must first show 
that these two operators on W commute. To do this, suppose w € W. Then there 
exists a © C such that 


(ST)w = S(P(Tw)) = S(Tw —av,) = P(S(Tw — av,)) = P((ST)w), 


where the last equality holds because v, is an eigenvector of S and Pv, = 0. 
Similarly, 

(TS)w = P((TS)w). 
Because the operators S and T commute, the last two displayed equations show 
that (ST)w a (TS)w. Hence § and T commute. 

Thus we can use our induction hypothesis to state that there exists a basis 
Vp, «++, U, Of W such that S and T both have upper-triangular matrices with respect 
to this basis. The list v,,...,v,, is a basis of V. 

Ifk € {2,...,n}, then there exist a,, b, © C such that 


Su, = 4,0, + Su, and Ty, = b,v, + To,. 
Because S and T have upper-triangular matrices with respect to v»,...,U,, We 


know that Soy € span(v,...,0,) and Tv, € span(vp, ...,0,). Hence the equations 
above imply that 


Sv, € span(v,,...,0%) and Tx, € span(vy,..., U;). 


Thus S and T have upper-triangular matrices with respect to v,, ..., v,,, aS desired. 


Exercise 9(b) extends the result above to more than two operators. 
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In general, it is not possible to determine the eigenvalues of the sum or product 
of two operators from the eigenvalues of the two operators. However, the next 
result shows that something nice happens when the two operators commute. 


5.81 eigenvalues of sum and product of commuting operators 


Suppose V is a finite-dimensional complex vector space and S, T are commut- 
ing operators on V. Then 


e every eigenvalue of S + T is an eigenvalue of S plus an eigenvalue of T, 


e every eigenvalue of ST is an eigenvalue of S times an eigenvalue of T. 


Proof There is a basis of V with respect to which both S and T have upper- 
triangular matrices (by 5.80). With respect to that basis, 


M(S+T)=M(S)+M(T) and M(ST) = M(S)M(T), 


as stated in 3.35 and 3.43. 

The definition of matrix addition shows that each entry on the diagonal of 
M(S + T) equals the sum of the corresponding entries on the diagonals of A (S) 
and M(T). Similarly, because (S) and M(T) are upper-triangular matrices, 
the definition of matrix multiplication shows that each entry on the diagonal of 
M (ST) equals the product of the corresponding entries on the diagonals of A (S) 
and M(T). Furthermore, 1(S + T) and M(ST) are upper-triangular matrices 
(see Exercise 2 in Section 5B). 

Every entry on the diagonal of (S) is an eigenvalue of S, and every entry 
on the diagonal of (T) is an eigenvalue of T (by 5.41). Every eigenvalue 
of S + T is on the diagonal of M(S + T), and every eigenvalue of ST is on 
the diagonal of (ST) (these assertions follow from 5.41). Putting all this 
together, we conclude that every eigenvalue of S + T is an eigenvalue of S plus 
an eigenvalue of T, and every eigenvalue of ST is an eigenvalue of S times an 
eigenvalue of T. 


Exercises 5E 


1 Give an example of two commuting operators S, T on F* such that there 
is a subspace of F* that is invariant under S but not under T and there is a 
subspace of F* that is invariant under T but not under S. 


2 Suppose € is a subset of £(V) and every element of € is diagonalizable. 
Prove that there exists a basis of V with respect to which every element of € 
has a diagonal matrix if and only if every pair of elements of € commutes. 

This exercise extends 5.76, which considers the case in which € contains 
only two elements. For this exercise, € may contain any number of elements, 
and € may even be an infinite set. 
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Suppose S,T € £(V) are such that ST = TS. Suppose p € P(F). 


(a) Prove that null p(S) is invariant under T. 
(b) Prove that range p(S) is invariant under T. 


See 5.18 for the special case S = T. 


Prove or give a counterexample: If A is a diagonal matrix and B is an 
upper-triangular matrix of the same size as A, then A and B commute. 


Prove that a pair of operators on a finite-dimensional vector space commute 
if and only if their dual operators commute. 


See 3.118 for the definition of the dual of an operator. 


Suppose V is a finite-dimensional complex vector space and S,T € £(V) 
commute. Prove that there exist a, A © C such that 


range(S — al) + range(T — AI) # V. 


Suppose V is a complex vector space, S € £(V) is diagonalizable, and 
T € L(V) commutes with S. Prove that there is a basis of V such that S has 
a diagonal matrix with respect to this basis and T has an upper-triangular 
matrix with respect to this basis. 


Suppose m = 3 in Example 5.72 and D,, D, are the commuting partial 
differentiation operators on P;(R7) from that example. Find a basis of 
P;(R2) with respect to which D, and D, each have an upper-triangular 
matrix. 


Suppose V is a finite-dimensional nonzero complex vector space. Suppose 
that € C L(V) is such that S and T commute for all S$, T € €. 


(a) Prove that there is a vector in V that is an eigenvector for every element 
of €. 

(b) Prove that there is a basis of V with respect to which every element of 
€ has an upper-triangular matrix. 


This exercise extends 5.78 and 5.80, which consider the case in which € 
contains only two elements. For this exercise, € may contain any number of 
elements, and € may even be an infinite set. 


Give an example of two commuting operators S, T on a finite-dimensional 
real vector space such that S + T has a eigenvalue that does not equal an 
eigenvalue of S plus an eigenvalue of T and ST has a eigenvalue that does 
not equal an eigenvalue of S times an eigenvalue of T. 


This exercise shows that 5.81 does not hold on real vector spaces. 


Chapter 6 | Sis 


In making the definition of a vector space, we generalized the linear structure 
(addition and scalar multiplication) of R? and R°. We ignored geometric features 
such as the notions of length and angle. These ideas are embedded in the concept 
of inner products, which we will investigate in this chapter. 

Every inner product induces a norm, which you can think of as a length. 
This norm satisfies key properties such as the Pythagorean theorem, the triangle 
inequality, the parallelogram equality, and the Cauchy—Schwarz inequality. 

The notion of perpendicular vectors in Euclidean geometry gets renamed to 
orthogonal vectors in the context of an inner product space. We will see that 
orthonormal bases are tremendously useful in inner product spaces. The Gram— 
Schmidt procedure constructs such bases. This chapter will conclude by putting 
together these tools to solve minimization problems. 


standing assumptions for this chapter 


e F denotes R or C. 
e V and W denote vector spaces over F. 


VS-Ad 00 O48d Mayne 


The George Becbody Libres, now ee of Jolie Hops University opened while 
James Sylvester (1814-1897) was the university’s first mathematics professor. Sylvester's 
publications include the first use of the word matrix in mathematics. 
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Inner Products 


To motivate the concept of inner product, 
think of vectors in R? and R® as arrows (a, b) 
with initial point at the origin. The length 

of a vector v in R? or R® is called the 
norm of v and is denoted by |lv||. Thus 
for v = (a,b) € R?, we have | 


lvl] = Va2 + b2. 


Similarly, if v = (a,b,c) € R% then |jv|| = Va? + b2 + c?. 
Even though we cannot draw pictures in higher dimensions, the generalization 
to R” is easy: we define the norm of x = (x4,...,x,,) € R” by 


This vector v has norm V a2. + b?. 


= 2 2 
Ixll = xf te + X,F. 


The norm is not linear on R” To inject linearity into the discussion, we 
introduce the dot product. 


6.1. definition: dot product 


For x, y € R”, the dot product of x and y, denoted by x - y, is defined by 


XY HXYy te + UY n> 


where x = (x1,...,%,,) and y = (yy,...,Y,). 


4 n 
: The dot product of two a inR If we think of a vector as a point instead 
is a number, not a vector. Notice that of as an arrow, then |\x\| should be 


2 
X+ xX = ||x\|° for allx © R". Furthermore, interpreted to mean the distance from 
the dot product on R” has the following the origin to the point x. 

properties. 


e x-x>0 forall x € R” 

e x-x = Oif and only if x = 0. 

e For y € R” fixed, the map from R” to R that sends x € R” to x - y is linear. 
ex-y=y-xforallx,y ER” 


An inner product is a generalization of the dot product. At this point you may 
be tempted to guess that an inner product is defined by abstracting the properties 
of the dot product discussed in the last paragraph. For real vector spaces, that 
guess is correct. However, so that we can make a definition that will be useful 
for both real and complex vector spaces, we need to examine the complex case 
before making the definition. 
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Recall that if A = a + bi, where a,b € R, then 
e the absolute value of A, denoted by |A|, is defined by |A| = Va2 + b2 : 
e the complex conjugate of A, denoted by A, is defined by A = a — bi; 
e AZ = AA. 


See Chapter 4 for the definitions and the basic properties of the absolute value 
and complex conjugate. 
For z = (Z),...,Z,) © C’, we define the norm of z by 


lizll = \/Izq/? + ++ + [z,/7. 


The absolute values are needed because we want ||z|| to be a nonnegative number. 
Note that 
IIzll? = ZyZq + + + ZZ 

We want to think of ||z||? as the inner product of z with itself, as we did 
in R’. The equation above thus suggests that the inner product of the vector 
W = (Wy,...,W,) € C” with z should equal 

WZ, to + WZ. 

If the roles of the w and z were interchanged, the expression above would be 
replaced with its complex conjugate. Thus we should expect that the inner product 
of w with z equals the complex conjugate of the inner product of z with w. With 
that motivation, we are now ready to define an inner product on V, which may be 


areal or a complex vector space. 
One comment about the notation used in the next definition: 


e For A € C, the notation A > 0 means A is real and nonnegative. 


6.2 definition: inner product 


An inner product on V is a function that takes each ordered pair (u,v) of 
elements of V to a number (u,v) € F and has the following properties. 
positivity 

(v,v) > 0 for all v € V. 


definiteness 
(v, v) = Oif and only if v = 0. 


additivity in first slot 
(u+v,w) = (u,w) + (v,w) for all u,v, w © V. 


homogeneity in first slot 
(Au, v) = Atu,v) for all A € F and all u,v € V. 


conjugate symmetry 
(u,v) = (v,u) for all u,v € V. 
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Every real number equals its complex yog¢_ mathematicians iene ue 


conjugate. Thus if we are dealing with products as above, but many physicists 
a real vector space, then in the last con- se g definition that requires homo- 


dition above we can dispense with the geneity in the second slot instead of 
complex conjugate and simply state that the first slot. 
(u,v) = (v,u) for all u,v € V. 


6.3 example: inner products 


(a) The Euclidean inner product on F" is defined by 
(Wy, 5 Wy)s (Ze Zy)) = WZ + + WZ, 


for all (w,,...,W,,), (24, -+5Z,) © F" 


(b) Ifcy,...,c, are positive numbers, then an inner product can be defined on F” 
by 
( (Wy, se15 Wy)s (Zy5 +009 Zy)) = CW Z_ + + C,Wy Zp 


for all (w,,...,W,,), (24, +52) © F" 


(c) An inner product can be defined on the vector space of continuous real-valued 
functions on the interval [—1, 1] by 


(ha)= is fg 


for all f, g continuous real-valued functions on [—1, 1]. 


(d) An inner product can be defined on P(R) by 


1 
(p.) = p(O)q(0) + [pg 


for all p,q € P(R). 
(e) An inner product can be defined on P(R) by 


{(P,9) = in p(x)q(x)e~* dx 


for all p,q € P(R). 


6.4 definition: inner product space 


An inner product space is a vector space V along with an inner product on V. 


The most important example of an inner product space is F” with the Euclidean 
inner product given by (a) in the example above. When F” is referred to as an 
inner product space, you should assume that the inner product is the Euclidean 
inner product unless explicitly told otherwise. 
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So that we do not have to keep repeating the hypothesis that V and W are inner 
product spaces, we make the following assumption. 


For the rest of this chapter and the next chapter, V and W denote inner product 
spaces over F. 


Note the slight abuse of language here. An inner product space is a vector 
space along with an inner product on that vector space. When we say that a vector 
space V is an inner product space, we are also thinking that an inner product on 
V is lurking nearby or is clear from the context (or is the Euclidean inner product 
if the vector space is F”). 


6.6 basic properties of an inner product 


(a) For each fixed v € V, the function that takes u € V to (u,v) is a linear 
map from V to F. 


(b) (0,v) = 0 for every v € V. 
(c) (v,0) = 0 for every v € V. 


(d) (u,v +w) = (u,v) + (u, w) for all u,v, w € V. 


(e) (u, Av) = A(u,v) for all A € F and all u,v € V. 


Proof 
(a) Forv € V, the linearity of u + (u,v) follows from the conditions of additivity 
and homogeneity in the first slot in the definition of an inner product. 


(b) Every linear map takes 0 to 0. Thus (b) follows from (a). 


(c) Ifv € V, then the conjugate symmetry property in the definition of an inner 
product and (b) show that (v,0) = (0,v) = 0 = 0. 


(d) Suppose u,v, w € V. Then 


(u,v + Ww) = (v+W,U) 
= (V,U) + (Ww, U) 
= (0, u) + (Ww, U) 


= (u,v) + (u,Ww). 


(e) Suppose A € F and u,v € V. Then 
(u, Av) = (Av, u) 
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Norms 


Our motivation for defining inner products came initially from the norms of 
vectors on R? and R°. Now we see that each inner product determines a norm. 


For v € V, the norm of v, denoted by |v||, is defined by 


lull = V2, v). 


6.8 example: norms 


(a) If (z,,...,Z,,) € F” (with the Euclidean inner product), then 


(Zy, +65 ZI] = VY lZq/? + + + lz, /. 


(b) For fin the vector space of continuous real-valued functions on [—1, 1] and 
with inner product given as in 6.3(c), we have 


1 
= 2. 
wfl=y ff 
6.9 basic properties of the norm 


Suppose v € V. 


(a) |lvl| = Oif and only if v = 0. 
(b) ||Av|| = |A||lv|| for all A © F. 


Proof 
(a) The desired result holds because (v, v) = O if and only if v = 0. 


(b) Suppose A € F. Then 
|Av|? = (Ao, Av) 
= Av, Av) 
= AX(v, 0) 
= AP |lol??. 
Taking square roots now gives the desired equality. 


The proof of (b) in the result above illustrates a general principle: working 
with norms squared is usually easier than working directly with norms. 
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Now we come to a crucial definition. 


6.10 definition: orthogonal 


Two vectors u,v € V are called orthogonal if (u,v) = 0. 


In the definition above, the order of 
the two vectors does not matter, because 
(u,v) = 0 if and only if (v,u) = 0. In- — yight-angled. 
stead of saying u and v are orthogonal, 
sometimes we say u is orthogonal to v. 

Exercise 15 asks you to prove that if u,v are nonzero vectors in R2 then 


The word orthogonal comes from the 
Greek word orthogonios, which means 


(u, 0) = |lull lull cos @, 


where @ is the angle between u and v (thinking of u and v as arrows with initial 
point at the origin). Thus two nonzero vectors in R? are orthogonal (with respect 
to the Euclidean inner product) if and only if the cosine of the angle between 
them is 0, which happens if and only if the vectors are perpendicular in the usual 
sense of plane geometry. Thus you can think of the word orthogonal as a fancy 
word meaning perpendicular. 

We begin our study of orthogonality with an easy result. 


11. orthogonality and 0 


(a) 0 is orthogonal to every vector in V. 


(b) 0 is the only vector in V that is orthogonal to itself. 


Proof 
(a) Recall that 6.6(b) states that (0,v) = 0 for every v € V. 


(b) Ifv € V and (v, v) = 0, then v = 0 (by definition of inner product). 


For the special case V = R2 the next theorem was known over 3,500 years ago 
in Babylonia and then rediscovered and proved over 2,500 years ago in Greece. 
Of course, the proof below is not the original proof. 


6.12 Pythagorean theorem 


Suppose u,v € V. If u and v are orthogonal, then 


2 2 2 
lu + oll* = |lull* + |oll’ 


Proof Suppose (u,v) = 0. Then 
lu + v2 = (ut+0,u+0) 
= (U,U) + (U,V) + (0, U) + (0,0) 


2 2 
= |[ull* + [oll 
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Suppose u,v € V, with v + 0. We would like to write u as a scalar multiple 
of v plus a vector w orthogonal to v, as suggested in the picture here. 


An orthogonal decomposition: 
u expressed as a scalar multiple of v plus a vector orthogonal to v. 


To discover how to write wu as a scalar multiple of v plus a vector orthogonal 
to v, let c € F denote a scalar. Then 


u=cv+ (u—Cv). 
Thus we need to choose c so that v is orthogonal to (u — cv). Hence we want 
0 = (u—cv, v) = (u,v) — cllall. 


The equation above shows that we should choose c to be (u, v)/||v||?. Making this 
choice of c, we can write 


(u, 0) rn ( (u, 0) ) 
u= vt+(u- v}. 
ale Ilol|? 
As you should verify, the equation displayed above explicitly writes u as a scalar 


multiple of v plus a vector orthogonal to v. Thus we have proved the following 
key result. 


6.13 an orthogonal decomposition 


(u, 0) 


Iloll? 


Suppose u,v € V, with v # 0. Setc = 


u=cvu+w and (w,v)=0. 


The orthogonal decomposition 6.13 will be used in the proof of the Cauchy— 
Schwarz inequality, which is our next result and is one of the most important 
inequalities in mathematics. 
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6.14 Cauchy—Schwarz inequality 


Suppose u,v € V. Then 


Ku, 0)| < [lll loll. 


This inequality is an equality if and only if one of u,v is a scalar multiple of 
the other. 


Proof 


If v = 0, then both sides of the desired inequality equal 0. Thus we can 


assume that v # 0. Consider the orthogonal decomposition 


_ {u,0) 
loll? 


given by 6.13, where w is orthogonal to v. 


(u, 0) 
a) 2 


ll = | 


2 
(u, v)| 
0 2 


2 
v)| 
2 


(u, 
0 


6.15 > 


6) 


By the Pythagorean theorem, 


2. 
| + |leo!? 
+ |? 


Multiplying both sides of this inequality by ||v||* and then taking square roots 


gives the desired inequality. 

The proof in the paragraph above 
shows that the Cauchy—Schwarz inequal- 
ity is an equality if and only if 6.15 is 
an equality. This happens if and only 
ifw = 0. But w = 0 if and only if u 
is a multiple of v (see 6.13). Thus the 
Cauchy—Schwarz inequality is an equal- 
ity if and only if wis a scalar multiple of v 
or v is a scalar multiple of u (or both; the 
phrasing has been chosen to cover cases 
in which either u or v equals 0). 


Augustin-Louis Cauchy (1789-1857) 
proved 6.16(a) in 1821. In 1859, 
Cauchy’s student Viktor Bunyakovsky 
(1804-1889) proved integral inequal- 
ities like the one in 6.16(b). A few 
decades later, similar discoveries by 
Hermann Schwarz (1843-1921) at- 
tracted more attention and led to the 
name of this inequality. 


6.16 example: Cauchy—Schwarz inequality 


(a) Tfx4,...,%,.Y1,--5Y, © R, then 


(X41 ae eee oe < (x? + 


2 
n 


tae) (ye + +7); 


as follows from applying the Cauchy—Schwarz inequality to the vectors 
(X41, 005 Xy_)s (Yy, +5 Y,) € R” using the usual Euclidean inner product. 
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(b) If fg are continuous real-valued functions on [—1, 1], then 


1 2 1 1 
If rel <(f PCL, 82) 
as follows from applying the Cauchy—Schwarz inequality to Example 6.3(c). 


The next result, called the triangle inequality, v 
has the geometric interpretation that the length 
of each side of a triangle is less than the sum of 
the lengths of the other two sides. 

Note that the triangle inequality implies that 
the shortest polygonal path between two points is 
a single line segment (a polygonal path consists 
of line segments). 


In this triangle, the length of 
u + v is less than the length 
of u plus the length of v. 


6.17 triangle inequality 


Suppose u,v € V. Then 


lu + oll < llull + Hall. 


This inequality is an equality if and only if one of u, v is a nonnegative real 
multiple of the other. 


Proof We have 
ju + vt = (u+0,u4+ 0) 
= (Uu,U) + (0,0) + (U,V) + (U0, U) 
= (u,u) + (0,0) + (u,v) + (u,v) 


= |lull? + lol? + 2Re(u, v) 


6.18 < |lul? + |loll? + 2|(u, v)| 
6.19 < lull? + lloll? + lal loll 
2 
= (lull + loll), 


where 6.19 follows from the Cauchy—Schwarz inequality (6.14). Taking square 
roots of both sides of the inequality above gives the desired inequality. 

The proof above shows that the triangle inequality is an equality if and only if 
we have equality in 6.18 and 6.19. Thus we have equality in the triangle inequality 
if and only if 


6.20 (u, 0) = |[ull|loll- 


If one of u,v is a nonnegative real multiple of the other, then 6.20 holds. Con- 
versely, suppose 6.20 holds. Then the condition for equality in the Cauchy— 
Schwarz inequality (6.14) implies that one of u,v is a scalar multiple of the other. 
This scalar must be a nonnegative real number, by 6.20, completing the proof. 


For the reverse triangle inequality, see Exercise 20. 
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The next result is called the parallel- 
ogram equality because of its geometric 
interpretation: in every parallelogram, the 
sum of the squares of the lengths of the 
diagonals equals the sum of the squares of 
the lengths of the four sides. Note that the 
proof here is more straightforward than The diagonals of this parallelogram 
the usual proof in Euclidean geometry. areut+vand u—v. 


21 parallelogram equality 


Suppose u,v € V. Then 


Ile + oI? + [lu — oll? = 2(\lull? + loll”). 


Proof We have 
lu + oll? + lu — ll? = (ut 0,u+0) + (u—v,u—v) 


= |lul? + lol? + (u,v) + (2, u) 


+ llul? + lll? — (u,v) — (2, u) 
= 2(|lulI? + llol*), 


as desired. 


Exercises 6A 


1 Prove or give a counterexample: If vj,...,v,, € V, then 


2 Suppose S € L(V). Define (-, -); by 
(u, 0), = (Su, Sv) 
for all u,v € V. Show that (-,-), is an inner product on V if and only if S is 
injective. 


3 (a) Show that the function taking an ordered pair ((%1,%5), (Y1,Y2)) of 
elements of R? to |x,y,| + |x>Y>| is not an inner product on R2 

(b) Show that the function taking an ordered pair ((x1,%2,%3), (Yi; Yo. Y3)) 
of elements of R° to x,y, + x3Y3 is not an inner product on R3 


4 Suppose T € L(V) is such that ||To|| < ||o|| for every v € V. Prove that 
T = V2Iis injective. 
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Suppose V is a real inner product space. 


(a) Show that (u + v, u — v) = |lul|? — |lul? for every u,v € V. 

(b) Show that if u,v € V have the same norm, then u + v is orthogonal to 
u—v. 

(c) Use (b) to show that the diagonals of a rhombus are perpendicular to 
each other. 


Suppose u,v € V. Prove that (u,v) =O <— |lul| < |lu + av|| for alla € F. 


Suppose u,v € V. Prove that |lau + bo|| = ||bu + av|| for all a,b € R if and 
only if |[z|| = lla]. 


Suppose a,b,c,x,y € R and a? + b% +c27 +x? +y% < 1. Prove that 
a+b+c+4x+9y < 10. 


Suppose u,v € V and ||u|| = ||o|| = 1 and (u,v) = 1. Prove that u = v. 


Suppose u,v € V and ||u\| < 1 and ||v|| < 1. Prove that 


y1—lluli?V1 —- lol? < 1 —|(u, 0)]. 


Find vectors u,v € R* such that uw is a scalar multiple of (1,3), v is orthog- 
onal to (1,3), and (1,2) =u+v. 


Suppose a, b, c,d are positive numbers. 
1 


1 1 1 
(a) Prove that (a+b +c+d)(—+ = ++) > 16. 
a bed 


(b) For which positive numbers a, b,c, d is the inequality above an equality? 


Show that the square of an average is less than or equal to the average of the 


squares. More precisely, show that if a,, ...,a,, € R, then the square of the 


average of a, ...,4,, is less than or equal to the average of a7, ...,a,2. 


Suppose v € V and v # 0. Prove that v/||v|| is the unique closest element on 
the unit sphere of V to v. More precisely, prove that if u © V and ||u|| = 1, 
then 


v 
lo- S| < to — mi, 
Ilul| 
with equality only if u = v/|lo\l. 
Suppose u,v are nonzero vectors in R%. Prove that 
{u, 0) = ||ull loll cos 6, 


where 6 is the angle between u and v (thinking of u and v as arrows with 
initial point at the origin). 


Hint: Use the law of cosines on the triangle formed by u, v, and u — v. 
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17 


18 
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The angle between two vectors (thought of as arrows with initial point at 
the origin) in R* or R? can be defined geometrically. However, geometry is 
not as clear in R” for n > 3. Thus the angle between two nonzero vectors 
x,y € R” is defined to be 


(x,y) 
OS : 
Ill lly 


where the motivation for this definition comes from Exercise 15. Explain 
why the Cauchy—Schwarz inequality is needed to show that this definition 
makes sense. 


Prove that 


n 2 n n be 
(ra) <(288)(2 4) 
k=1 k=1 kaa ke 
for all real numbers a,,...,a,, and b,,...,b,. 


(a) Suppose f: [1, 0) > [0, co) is continuous. Show that 


(ff) sf era a, 


(b) For which continuous functions f: [1, 00) — [0, oo) is the inequality in 
(a) an equality with both sides finite? 


Suppose 7v,...,v, is a basis of V and T € L(V). Prove that if A is an 
eigenvalue of T, then 


IAPR < DY MIM) AP 


j=lk=l 


where M(T) ee denotes the entry in row j, column k of the matrix of T with 
respect to the basis v1, ...,,. 


Prove that if u,v € V, then | lull — Ilall | < |lu — ol. 


The inequality above is called the reverse triangle inequality. For the 
reverse triangle inequality when V = C, see Exercise 2 in Chapter 4. 


Suppose u,v € V are such that 
ul = 3, Jut+ol=4,  |lu—vll =6. 
What number does |lv|| equal? 
Show that if u,v € V, then 
lle + ofl le — Ol < Mall? + IIOP. 


Suppose 7, ...,0,,, © V are such that ||v,|| < 1 for each k = 1,...,m. Show 
that there exist a, ...,a,,, © {1, —1} such that 


|]a,0, ++ +4,,0,,|| < Vm. 
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Prove or give a counterexample: If ||-|| is the norm associated with an inner 
product on R2 then there exists (x, y€ R? such that ||(x, y)|| # max{|x, ly}. 


Suppose p > 0. Prove that there is an inner product on R? such that the 
associated norm is given by 


Ix, DIL = (la? + yl)? 
for all (x, y) € R? if and only if p = 2. 
Suppose V is a real inner product space. Prove that 


Oe as oll? — lu — oll? 
: 4 


for all u,v © V. 
Suppose V is a complex inner product space. Prove that 


PA iD . Ds : 2: 
lu + OI|* — |lu — Ol + |lu + toll*i — |lu — io||*i 


(u, 0) = Z 


for all u,v € V. 
A norm on a vector space U is a function 
Ill: U — [0, ce) 


such that ||u|| = 0 if and only if u = 0, ||au|| = |a|||u\| for all a © F and all 
u € U, and ||u + o|| < ||ull + llaI| for all u, o © U. Prove that a norm satisfying 
the parallelogram equality comes from an inner product (in other words, 
show that if ||-|| is a norm on U satisfying the parallelogram equality, then 
there is an inner product (-,-) on U such that ||u|| = (uw, u)'/? for all u € U). 


Suppose V,, ..., V,,, are inner product spaces. Show that the equation 
( (Uy, +015 Us )s (Oy, «063 Oy) ) = (Uy, Vy) Ft + (Us On) 


defines an inner product on V, x --- x V,,. 


In the expression above on the right, for each k = 1,...,m, the inner product 
(ux, Vz) denotes the inner product on V,. Each of the spaces Vj, ..., Vj, may 
have a different inner product, even though the same notation is used here. 


Suppose V is a real inner product space. For u,v, w,x € V, define 
(U+ 10, W + iX)e = (U,W) + (V, xX) + ((0,W) — (U, x))i. 


(a) Show that (-, -)- makes V¢ into a complex inner product space. 
(b) Show that if u,v € V, then 


. 2 2 Pe 
(U,V)e = (u,v) and |lu + iv|lé = lull’ + loll’. 


See Exercise 8 in Section 1B for the definition of the complexification Ve. 


31 


32 


33 


34 
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Suppose u,v, w € V. Prove that 


1 2 Iw—ul? +llw—ol? — lu — vl? 
Jw — 3+) = ; -=— 

Suppose that E is a subset of V with the property that u,v © E implies 

5 (u +v) € E. Let w & V. Show that there is at most one point in E that is 


closest to w. In other words, show that there is at most one u € E such that 


| — ull < |lw — || 
for allx € E. 


Suppose f, g are differentiable functions from R to R”. 
(a) Show that 


(Ff. gy = (FO. 8b) + (fb, 8(b). 
(b) Suppose c is a positive number and ||f (¢)|| = c for every t € R. Show 
that (f(t), f(f)) = 0 for every t ER. 
(c) Interpret the result in (b) geometrically in terms of the tangent vector to 
a curve lying on a sphere in R” centered at the origin. 


A function f: R > R" is called differentiable if there exist differentiable 
functions f,,..., f, from R to R such that f(t) = (fy (t), -.-, fy (£)) for each 
t © R. Furthermore, for each t € R, the derivative f'(t) € R" is defined by 


fi) = (A, pees: fir (t)). 


Use inner products to prove Apollonius’s identity: In a triangle with sides of 
length a, b, and c, let d be the length of the line segment from the midpoint 
of the side of length c to the opposite vertex. Then 


+h? = 5c + 2d”. 
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35 Fix a positive integer n. The Laplacian Ap of a twice differentiable real- 
valued function p on R” is the function on R” defined by 
op op 


Ape "ope ; 
P axe” * Ox2 


The function p is called harmonic if Ap = 0. 


A polynomial on R” is a linear combination (with coefficients in R) of 
functions of the form x 7"1---x,!"", where my, ...,m,, are nonnegative integers. 


Suppose q is a polynomial on R”. Prove that there exists a harmonic 
polynomial p on R” such that p(x) = q(x) for every x € R” with ||x|| = 1. 

The only fact about harmonic functions that you need for this exercise is 
that if p is a harmonic function on R" and p(x) = 0 for all x € R" with 

Ix|| = 1, then p = 0. 

Hint: A reasonable guess is that the desired harmonic polynomial p is of the 
form q+ (1 —|\xl)rfor some polynomial r. Prove that there is a polynomial 
ron R" such that q + (1 — ||x\I?)r is harmonic by defining an operator T on 

a suitable vector space by 


Tr = A((1 = Ixl?)r) 


and then showing that T is injective and hence surjective. 


In realms of numbers, where the secrets lie, 

A noble truth emerges from the deep, 

Cauchy and Schwarz, their wisdom they apply, 
An inequality for all to keep. 


Two vectors, by this bond, are intertwined, 
As inner products weave a gilded thread, 
Their magnitude, by providence, confined, 
A bound to which their destiny is wed. 


Though shadows fall, and twilight dims the day, 
This inequality will stand the test, 

To guide us in our quest, to light the way, 

And in its truth, our understanding rest. 


So sing, ye muses, of this noble feat, 
Cauchy—Schwarz, the bound that none can beat. 


—written by ChatGPT with input Shakespearean sonnet on Cauchy—Schwarz inequality 
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6B Orthonormal Bases 


Orthonormal Lists and the Gram—Schmidt Procedure 


6.22 definition: orthonormal 


e A list of vectors is called orthonormal if each vector in the list has norm 1 
and is orthogonal to all the other vectors in the list. 


e In other words, a list e,,...,e,, of vectors in V is orthonormal if 


1 ifj=k, 


as ( iff #k 


for all j,k € {1,..., m}. 


6.23 example: orthonormal lists 


(a) The standard basis of F” is an orthonormal list. 

1 1 1 11 : ed a8 
(b) ( BP ), ( 7a 0) is an orthonormal list in F. 

tA 1A) fk tb AA 2G ict in BS 
(c) ( ace aya 5° 10). B) es =) is an orthonormal list in F®. 


(d) Suppose n is a positive integer. Then, as Exercise 4 asks you to verify, 


1 cosx cos2x cosnx sinx sin2x sin nx 
——? ? goeeeg > > gtees 
Jan va Va Vt’ Vm’ Va Vt 


is an orthonormal list of vectors in C[—7z, 7], the vector space of continuous 
real-valued functions on [—7, 7t] with inner product 


(f= | fe. 


The orthonormal list above is often used for modeling periodic phenomena, 
such as tides. 


(e) Suppose we make ?,(R) into an inner product space using the inner product 
given by 


wa =f 9 


for all p,q € P.(R). The standard basis 1, x, x of P>(R) is not an orthonor- 
mal list because the vectors in that list do not have norm 1. Dividing each 


vector by its norm gives the list 1 /V2, /3/2x, [5 /2x2, in which each vector 
has norm 1, and the second vector is orthogonal to the first and third vectors. 
However, the first and third vectors are not orthogonal. Thus this is not an 
orthonormal list. Soon we will see how to construct an orthonormal list from 
the standard basis 1, x, x7 (see Example 6.34). 
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Orthonormal lists are particularly easy to work with, as illustrated by the next 
result. 


6.24 norm of an orthonormal linear combination 


Suppose e,,...,é,, is an orthonormal list of vectors in V. Then 


Ilayey +o + Agel? = lay|? + 2+ + [ayy I? 


for all ay, ...,4,, © F. 


Proof Because each e, has norm 1, this follows from repeated applications of 
the Pythagorean theorem (6.12). 


The result above has the following important corollary. 


6.25 orthonormal lists are linearly independent 
Every orthonormal list of vectors is linearly independent. 


Proof Suppose e,,..., é,, is an orthonormal list of vectors in V and ay,...,4,, € F 
are such that 


Ayey te + Ay €y = O. 
Then |a,* + --- + |a,,* = 0 (by 6.24), which means that all the a,’s are 0. Thus 
€1,++.,€,, is linearly independent. 


Now we come to an important inequality. 


6.26 Bessel’s inequality 


Suppose e,, ...,é,,, is an orthonormal list of vectors in V. If v € V then 


2 2: 
(0, ey) + + [O,e [>< lol. 


Proof Suppose v € V. Then 


UV = (VU, €y ey Hott FO, Cy Cy +U — (U, Cy ey — 28t — (0, Cy Cm: 
a a 
u w 
Let u and w be defined as in the equation above. If k € {1,...,m}, then 
(W,e,) = (U,e,) — (0, e,)(e,,e€) = 0. This implies that (w,u) = 0. The 
Pythagorean theorem now implies that 


2 2 2 
oll” = lulle + tell 
> |? 
2 2 
= \(v, e1)| te + (0, e,,)| ry 


where the last line comes from 6.24. 
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The next definition introduces one of the most useful concepts in the study of 
inner product spaces. 


6.27 definition: orthonormal basis 


An orthonormal basis of V is an orthonormal list of vectors in V that is also a 
basis of V. 


For example, the standard basis is an orthonormal basis of F”. 


6.28 orthonormal lists of the right length are orthonormal bases 


Suppose V is finite-dimensional. Then every orthonormal list of vectors in V 
of length dim V is an orthonormal basis of V. 


Proof By 6.25, every orthonormal list of vectors in V is linearly independent. 
Thus every such list of the right length is a basis—see 2.38. 


6.29 example: an orthonormal basis of F* 


As mentioned above, the standard basis is an orthonormal basis of F£, We now 
show that 


is also an orthonormal basis of F* 


We have 
IGhhD)|=/e+h+b+h-1 


Similarly, the other three vectors in the list above also have norm 1. 
Note that 


(BB GR-B-D) ab d4b 242-(CB)+E-(CBeao 
Similarly, the inner product of any two distinct vectors in the list above also 
equals 0. 

Thus the list above is orthonormal. Because we have an orthonormal list of 
length four in the four-dimensional vector space F*, this list is an orthonormal 
basis of F* (by 6.28). 


In general, given a basis e,,...,e,, of V and a vector v € V, we know that there 
is some choice of scalars a,, ...,a,, € F such that 


V0= aye, ae ayen- 


Computing the numbers a,,...,a,, that satisfy the equation above can be a long 
computation for an arbitrary basis of V. The next result shows, however, that this 
is easy for an orthonormal basis—just take a, = (v, e;). 
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Notice how the next result makes The formula below for |u\| is called 


each inner product space of dimension pgyseyal’s identity. It was published in 
n behave like F", with the role of the — 7799 in the context of Fourier series. 


coordinates of a vector in F” played by 
(VU, €1)s «+05 (Uy): 


6.30 writing a vector as a linear combination of an orthonormal basis 


Suppose e,, ...,é,, is an orthonormal basis of V and u,v € V. Then 
(a) UV = (U, €1)ey + +++ + (UV, En )en, 


(b) lol? = (ve)? ++ + [dv,e,,)?, 


(Cc) (u,v) = (U, ey){U, ey) + +++ + (U,e,){U, e,). 


Proof Because e,,...,e,, is a basis of V, there exist scalars a), ...,a,, such that 
V0= aye, ap see a Ayey- 


Because é},..., e,, is orthonormal, taking the inner product of both sides of this 
equation with e, gives (v,e,) = a,. Thus (a) holds. 

Now (b) follows immediately from (a) and 6.24. 

Taking the inner product of u with each side of (a) and then using the conjugate 
symmetry of the inner product gives (c). 


6.31 example: finding coefficients for a linear combination 


Suppose we want to write the vector (1,2,4,7) € F* as a linear combination 
of the orthonormal basis 


121411) /11 1 1) 71 1211 11 011 

eG 2° 2? ar eG ea) Cee a) (=e: 2°” 2? z) 
of F* from Example 6.29. Instead of solving a system of four linear equations 
in four unknowns, as typically would be required if we were working with a 


nonorthonormal basis, we simply evaluate four inner products and use 6.30(a), 
getting that (1, 2, 4,7) equals 


1111 11 1 1 1 1 1 1 11 1 1 
7(3> 332) —4(2 9-3-3) + (3-33) + A-3 DF 2): 


Now that we understand the usefulness of orthonormal bases, how do we go 
about finding them? For example, does #,,,(R) with inner product as in 6.3(c) 
have an orthonormal basis? The next result will lead to answers to these questions. 

The algorithm used in the next proof Jorgen Gram (1850-1916) and Erhard 
is called the Gram—Schmidt procedure. Schmidt (1876-1959) popularized this 
It gives a method for turning a linearly algorithm that constructs orthonormal 
independent list into an orthonormal list [js¢s. 
with the same span as the original list. 
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6.32 Gram—Schmidt procedure 


Suppose v,,...,U,, is a linearly independent list of vectors in V. Let f, = 0}. 
For k = 2,...,m, define f, inductively by 


(Ux, fi) _ Ore fe- 1) 
Wale N= Ife al? The 


Then ¢,,...,¢,, is an orthonormal list of 


Se = % - 


For each k = 1,...,m, let e, = 
vectors in V such that 


Se 
Ifill” 


span(v,,...,U,) = span(e,,..., &) 


for each k = 1,...,m 


Proof We will show by induction on k that the desired conclusion holds. To 
get started with k = 1, note that because e, = f,/IIf;\l, we have |le;|| = 1; also, 
span(v,) = span(e,) because e, is a nonzero multiple of v1. 

Suppose 1 < k < mand the list e;,...,e, 1 generated by 6.32 is an orthonormal 
list such that 


6.33 span(V,,..., Vp_1) = Span(e,,...,@,_4)- 


Because 04, ...,U,, is linearly independent, we have v, € span(vj, ...,0,_1). Thus 
v, € span(ey,...,@_1) = span(f,,..., f,_1), which implies that f, # 0. Hence 
we are not dividing by 0 in the definition of e, given in 6.32. Dividing a vector by 
its norm produces a new vector with norm 1; thus |le;|| = 1. 

Let j € {1,...,4 — 1}. Then 


(Ck &;) = aed 
5 
_ i — We fd, Pk fe-1) 
mA (% TA Wale J h) 
= pref — of) 
q| 


= 0. 


Thus e,,...,€, is an orthonormal list. 
From the definition of e, given in 6.32, we see that vu, © span(ey,...,e). 
Combining this information with 6.33 shows that 


span(v,,...,0,) € span(e,, ..., €,). 


Both lists above are linearly independent (the v’s by hypothesis, and the e’s by 
orthonormality and 6.25). Thus both subspaces above have dimension k, and 
hence they are equal, completing the induction step and thus completing the 
proof. 
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6.34 example: an orthonormal basis of P>(R) 


Suppose we make /,(R) into an inner product space using the inner product 
given by 


a= {m9 


for all p,q € P,(R). We know that 1, x, x? is a basis of P,(R), but it is not an 


orthonormal basis. We will find an orthonormal basis of ?,(R) by applying the 


Gram-Schmidt procedure with v, = 1, v. = x, and v3 = x?. 


To get started, take f, = v, = 1. Thus ||f,|? = ie 1 = 2. Hence the formula 
in 6.32 tells us that 


= (U2, fy) _ (x, 1) —x 
Ifill? Ifill? ; 
where the last equality holds because (x, 1) = fe. tdt =0. 


The formula above for f, implies that || f,|? = ie (dt = os Now the formula 
in 6.32 tells us that 


_ (035 fi) (03, fo) 
fe=%- ea hep 


The formula above for f3 implies that 
1 2 1 
1 2 1 8 
Ifal? =| (P- 5) at= | (#-3P + 5)at= B: 
Now dividing each of f,, f, fs by its norm gives us the orthonormal list 


Veyin J 8-9) 


The orthonormal list above has length three, which is the dimension of 7,(R). 
Hence this orthonormal list is an orthonormal basis of ?,(R) [by 6.28]. 


fo = 02 


fi=x 


fo = 22 — M21) — BGR xx = 2 - 


a 
e 


Now we can answer the question about the existence of orthonormal bases. 


6.35 existence of orthonormal basis 


Every finite-dimensional inner product space has an orthonormal basis. 


Proof Suppose V is finite-dimensional. Choose a basis of V. Apply the Gram— 
Schmidt procedure (6.32) to it, producing an orthonormal list of length dim V. 
By 6.28, this orthonormal list is an orthonormal basis of V. 


Sometimes we need to know not only that an orthonormal basis exists, but also 
that every orthonormal list can be extended to an orthonormal basis. In the next 
corollary, the Gram—Schmidt procedure shows that such an extension is always 
possible. 
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Proof Suppose ey, ...,é,, is an orthonormal list of vectors in V. Then e,, ..., e,, 
is linearly independent (by 6.25). Hence this list can be extended to a basis 
C15 +065 Cys U5 «+ Uy OF V (see 2.32). Now apply the Gram—Schmidt procedure 
(6.32) to ey, ...,€), 01, +++, Uy, producing an orthonormal list 


Cretis Cras [aaces tas 


here the formula given by the Gram—Schmidt procedure leaves the first mm vectors 
unchanged because they are already orthonormal. The list above is an orthonormal 
basis of V by 6.28. 


Recall that a matrix is called upper triangular if it looks like this: 

* * 

0 * 

where the 0 in the matrix above indicates that all entries below the diagonal 
equal 0, and asterisks are used to denote entries on and above the diagonal. 

In the last chapter, we gave a necessary and sufficient condition for an operator 
to have an upper-triangular matrix with respect to some basis (see 5.44). Now that 
we are dealing with inner product spaces, we would like to know whether there 
exists an orthonormal basis with respect to which we have an upper-triangular 
matrix. The next result shows that the condition for an operator to have an upper- 


triangular matrix with respect to some orthonormal basis is the same as the 
condition to have an upper-triangular matrix with respect to an arbitrary basis. 


6.37 upper-triangular matrix with respect to some orthonormal basis 


Suppose V is finite-dimensional and T € L(V). Then T has an upper- 


triangular matrix with respect to some orthonormal basis of V if and only if the 
minimal polynomial of T equals (z — A,)---(z — A,,) for some A4,...,A,, € F. 


Proof Suppose T has an upper-triangular matrix with respect to some basis 
V4,-..50, Of V. Thus span(vy,...,0,) is invariant under T for each k = 1,...,n 
(see 5.39). 

Apply the Gram—Schmidt procedure to 7, ..., v,,, producing an orthonormal 
basis e1,...,e,, of V. Because 


span(e},...,@,) = span(?}, ..., Ug) 


for each k (see 6.32), we conclude that span(e,, ...,e,) is invariant under T for 
each k = 1,...,n. Thus, by 5.39, T has an upper-triangular matrix with respect to 
the orthonormal basis e,,...,e,,. Now use 5.44 to complete the proof. 
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For complex vector spaces, the next ssa; Schur (1875-1941) published a 


result is an important application of the proof of the next result in 1909. 
result above. See Exercise 20 for a ver- 


sion of Schur’s theorem that applies simultaneously to more than one operator. 


6.38 Schur’s theorem 


Every operator on a finite-dimensional complex inner product space has an 
upper-triangular matrix with respect to some orthonormal basis. 


Proof The desired result follows from the second version of the fundamental 
theorem of algebra (4.13) and 6.37. 


Linear Functionals on Inner Product Spaces 


Because linear maps into the scalar field F play a special role, we defined a special 
name for them and their vector space in Section 3F. Those definitions are repeated 
below in case you skipped Section 3F. 


6.39 definition: linear functional, dual space, V' 


e A linear functional on V is a linear map from V to F. 


e The dual space of V, denoted by V’, is the vector space of all linear 
functionals on V. In other words, V’ = £(V,F). 


6.40 example: linear functional on F° 


The function yg: F° — F defined by 
Q(Z1,%9,23) = 224 — 52 + Z3 
is a linear functional on F*. We could write this linear functional in the form 
p(Z) = (Z,w) 


for every z € F°, where w = (2, —5,1). 
6.41 example: linear functional on P;(R) 
The function g: ?(R) > R defined by 


1 
p(p) = iE p(t) (cos(zt)) dt 


is a linear functional on ?;(R). 
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Ifo € V, then the map that sends u —rp¢ next result is named in honor 
to (u, v) is a linear functional on V. The of Frigyes Riesz (1880-1956), who 
next result states that every linear func- proved several theorems early in the 
tional on V is of this form. For example, twentieth century that look very much 
we can take v = (2,—5,1) in Example like the result below. 

6.40. 

Suppose we make the vector space ?;(R) into an inner product space by 
defining (p,q) = ie pq. Let @ be as in Example 6.41. It is not obvious that there 
exists q € P5(R) such that 


1 
ie p(t)(cos(ztt)) dt = (p,q) 


for every p © P;(R) [we cannot take q(t) = cos(7tt) because that choice of q is 
not an element of 7(R)]. The next result tells us the somewhat surprising result 
that there indeed exists a polynomial q € ?;(R) such that the equation above 
holds for all p € P5(R). 


6.42 Riesz representation theorem 


Suppose V is finite-dimensional and ¢ is a linear functional on V. Then there 
is a unique vector v € V such that 


g(u) = (u,v) 


for every u € V. 


Proof First we show that there exists a vector v € V such that p(u) = (u,v) for 
every u € V. Lete,,...,e,, be an orthonormal basis of V. Then 


P(U) = P(u, ey ey + + (Uy en en) 
= (U, ey) P(ey) + + (Uy Cy) P(Ey) 
= (u, p(ey)ey + + PE, en) 
for every u € V, where the first equality comes from 6.30(a). Thus setting 
6.43 V = P(e )ey +2 + Pn )en» 


we have p(u) = (u,v) for every u € V, as desired. 
Now we prove that only one vector v € V has the desired behavior. Suppose 
V1, Vy € V are such that 


plu) = (u, 01) = (U, 02) 
for every u € V. Then 
0 = (u, 01) — (u, 02) = (u, 01 — 02) 


for every u € V. Taking u = v1 — v2 shows that v; — v7 = 0. Thus 7, = v9, 
completing the proof of the uniqueness part of the result. 
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6.44 example: computation illustrating Riesz representation theorem 


Suppose we want to find a polynomial q € P?,(R) such that 


1 1 
6.45 Is p(t) (cos(zt)) dt = [, pq 


for every polynomial p € ?,(R). To do this, we make ?,(R) into an inner product 
space by defining (p, q) to be the right side of the equation above for p,q € P(R). 
Note that the left side of the equation above does not equal the inner product 
in P,(R) of p and the function t + cos(7tt) because this last function is not a 
polynomial. 

Define a linear functional g on P,(R) by letting 


1 
p(p) = ie p(t) (cos(ztt) ) dt 


for each p € P,(R). Now use the orthonormal basis from Example 6.34 and 
apply formula 6.43 from the proof of the Riesz representation theorem to see that 
if p © Po(R), then p(p) = (p,q), where 


q(x) = (J), \[Ecos(ret) at)\/} + (ie \[ 3 tcos(ret at [3x 
+ ({, /(8(e _ 5) cos(7ct) at 2 _ ) 


A bit of calculus applied to the equation above shows that 
a= (1 — 3x7). 
The same procedure shows that if we want to find q € ?(R) such that 6.45 


holds for all p € P;(R), then we should take 


g(x) = 23 ( (27 - 20) + (24? — 270)x? + (315 — 3072)x*). 


Suppose V is finite-dimensional and ¢ a linear functional on V. Then 6.43 
gives a formula for the vector v that satisfies 


plu) = (u, 0) 
for all wu € V. Specifically, we have 
VU = P(ey)ey +o + PlEy ey: 


The right side of the equation above seems to depend on the orthonormal basis 
€1,+..,€, as well as on gy. However, 6.42 tells us that v is uniquely determined 
by g. Thus the right side of the equation above is the same regardless of which 
orthonormal basis e,, ...,e,, of V is chosen. 

For two additional different proofs of the Riesz representation theorem, see 
6.58 and also Exercise 13 in Section 6C. 
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Exercises 6B 


1 Suppose e,...,e,,, is a list of vectors in V such that 
2 2 2 
]4y ey Fo FA ey lI = lay + + lay 


for all a,,...,a,, € F. Show that e,, ...,e,,, is an orthonormal list. 


This exercise provides a converse to 6.24. 
2 (a) Suppose @ € R. Show that both 
(cos 6, sin 8), (— sin @,cos@) and (cos 6, sin @), (sin 8, — cos 0) 


are orthonormal bases of R*. 
(b) Show that each orthonormal basis of R? is of the form given by one of 
the two possibilities in (a). 


3 Suppose e,,...,¢,, is an orthonormal list in V and v € V. Prove that 
2 2 
lol? = |v, e,)[F + + Kv, ef <= 0 © span(ey,...,€n)- 
4 Suppose n is a positive integer. Prove that 
1  cosx cos2x cosnx sinx sin2x sin nx 
Von Vit > Vit bead I Vit 2 Va Vit ane ee Vit 


is an orthonormal list of vectors in C[—71, 7r], the vector space of continuous 
real-valued functions on [—7:, 7] with inner product 


(f,8) = [. fg. 


Hint: The following formulas should help. 


sin(x — y) + sin(x + y) 


(sin x)(cos y) = ; 
(sinx)(siny) = cos(x — y) 7 cos(x + y) 

cos(x — y) + cos(x + y) 
(cos x)(cosy) = ; 


5 Suppose f: [—7t, 77] > R is continuous. For each nonnegative integer k, 
define 


a = = [- fe cos(kx)dx and bh= = [fe sin(kx) dx. 
Prove that “ 
OY e+ 02) <f" P 
2 «1 7m 


The inequality above is actually an equality for all continuous functions 
f: [-7,7] — R. However, proving that this inequality is an equality 
involves Fourier series techniques beyond the scope of this book. 
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6 Suppose ey,...,¢,, is an orthonormal basis of V. 
(a) Prove that if v,,...,v,, are vectors in V such that 
1 
lle, — Oxll < a 


for each k, then 7}, ...,v,, is a basis of V. 
(b) Show that there exist v1, ...,v,, € V such that 


1 
lex — ll S$ 
ii 
for each k, but vj, ...,v,, is not linearly independent. 
This exercise states in (a) that an appropriately small perturbation of an 


orthonormal basis is a basis. Then (b) shows that the number 1/V/n on the 
right side of the inequality in (a) cannot be improved upon. 


7 Suppose T € £(R?) has an upper-triangular matrix with respect to the basis 
(1,0,0), (1, 1, 1), (1,1,2). Find an orthonormal basis of R*? with respect to 


which T has an uppet-triangular matrix. 


8 Make /,(R) into an inner product space by defining (p,q) = i, pq for all 


p.9 € Po(R). 


(a) Apply the Gram—Schmidt procedure to the basis 1, x, x? to produce an 


orthonormal basis of ?,(R). 


(b) The differentiation operator (the operator that takes p to p’) on P,(R) 
has an upper-triangular matrix with respect to the basis 1, x, x*, which is 
not an orthonormal basis. Find the matrix of the differentiation operator 
on ?>(R) with respect to the orthonormal basis produced in (a) and 
verify that this matrix is upper triangular, as expected from the proof of 


6.37. 


9 Suppose e;,...,é,, is the result of applying the Gram—Schmidt procedure to 
a linearly independent list v,,...,v,, in V. Prove that (7,,e,) > 0 for each 


k =1,...,m. 


10 Suppose 7v,,...,v,, is a linearly independent list in V. Explain why the 
orthonormal list produced by the formulas of the Gram—Schmidt procedure 
(6.32) is the only orthonormal list e,, ...,e,,, in V such that (v;,,e,) > 0 and 


span(vy,...,0,) = span(ey,...,e,) for each k = 1,...,m. 


The result in this exercise is used in the proof of 7.58. 


11 Find a polynomial q € ?,(R) such that p(5) = f pq for every p € Py(R). 


12 Find a polynomial q € ?,(R) such that 


[po costzex) ax = [ 
Pees 71x) dx = Pal 


for every p € P,(R). 


13 


14 


15 


16 


17 


18 


19 


20 


21 
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Show that a list v;,...,v,,, of vectors in V is linearly dependent if and only if 
the Gram—Schmidt formula in 6.32 produces f, = 0 for some k € {1,..., m}. 
This exercise gives an alternative to Gaussian elimination techniques for 
determining whether a list of vectors in an inner product space is linearly 

dependent. 


Suppose V is a real inner product space and 7, ..., v,, is a linearly indepen- 
dent list of vectors in V. Prove that there exist exactly 2” orthonormal lists 
€1, -++5€, Of vectors in V such that 


span(V},...,0,) = span(ey, ..., ey) 
for all k € {1,..., m}. 


Suppose (-,-); and (-,-), are inner products on V such that (u,v), = Oif 
and only if (u,v), = 0. Prove that there is a positive number c such that 
(u, V0), = CU, V)> for every u,v E V. 

This exercise shows that if two inner products have the same pairs of 

orthogonal vectors, then each of the inner products is a scalar multiple 

of the other inner product. 


Suppose V is finite-dimensional. Suppose (-,-)1, (-,-)2 are inner products on 
V with corresponding norms ||-||, and ||-|l,. Prove that there exists a positive 
number c such that ||v||, < cllv|l, for every v € V. 


Suppose F = C and V is finite-dimensional. Prove that if T is an operator 
on V such that 1 is the only eigenvalue of T and ||To|| < |lv|| for all v € V, 
then T is the identity operator. 


Suppose 1, ..., u,,, is a linearly independent list in V. Show that there exists 
v € V such that (u,,v) = 1 for allk € {1,...,m}. 


Suppose v,..., U,, is a basis of V. Prove that there exists a basis u,..., u,, of 
V such that 
0 ifj#k, 
U,,U,) = 
ee) . ifj =k. 
Suppose F = C, V is finite-dimensional, and € C £(V) is such that 
ST=TS 


for all S, T € &. Prove that there is an orthonormal basis of V with respect 
to which every element of € has an upper-triangular matrix. 
This exercise strengthens Exercise 9(b) in Section 5E (in the context of inner 
product spaces) by asserting that the basis in that exercise can be chosen to 
be orthonormal. 


Suppose F = C, V is finite-dimensional, T € Z(V), and all eigenvalues 
of T have absolute value less than 1. Let e > 0. Prove that there exists a 
positive integer m such that ||T’”’v|| < el|v|| for every v € V. 
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22 Suppose C[—1, 1] is the vector space of continuous real-valued functions 
on the interval [—1, 1] with inner product given by 


(£8) = f. fg 


for all f,¢ € C[—-1,1]. Let ¢ be the linear functional on C[—1, 1] defined 
by g(f) = f (0). Show that there does not exist g € C[—1, 1] such that 


p(f) =f) 
for every f € C[—1,1]. 


This exercise shows that the Riesz representation theorem (6.42) does not 
hold on infinite-dimensional vector spaces without additional hypotheses 
on V and 9. 


23 ~Forall u,v € V, define d(u, v) = ||u — vl. 


(a) Show that d is a metric on V. 

(b) Show that if V is finite-dimensional, then d is a complete metric on V 
(meaning that every Cauchy sequence converges). 

(c) Show that every finite-dimensional subspace of V is a closed subset 
of V (with respect to the metric d ). 


This exercise requires familiarity with metric spaces. 


orthogonality at the Supreme Court 


Law professor Richard Friedman presenting a case before the U.S. Supreme 
Court in 2010: 


Mr. Friedman: { think that issue is entirely orthogonal to the issue here 
because the Commonwealth is acknowledging— 

Chief Justice Roberts: I'm sorry. Entirely what? 

Mr. Friedman: Orthogonal. Right angle. Unrelated. Irrelevant. 

Chief Justice Roberts: Oh. 

Justice Scalia: What was that adjective? I liked that. 

Mr. Friedman: Orthogonal. 

Chief Justice Roberts: Orthogonal. 

Mr. Friedman: Right, right. 

Justice Scalia: Orthogonal, ooh. (Laughter.) 


Justice Kennedy: 1 knew this case presented us a problem. (Laughter.) 
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Orthogonal Complements 


6.46 definition: orthogonal complement, U+ 


If U is a subset of V, then the orthogonal complement of U, denoted by U+, is 
the set of all vectors in V that are orthogonal to every vector in U: 


Ut ={v EV: (u,v) = 0 for every u € U}. 


The orthogonal complement U+ depends on V as well as on U. However, the 
inner product space V should always be clear from the context and thus it can be 
omitted from the notation. 


6.47 example: orthogonal complements 


e If V = R° and U is the subset of V consisting of the single point (2,3, 5), then 
U* is the plane {(x,y,z) € R° : 2x + 3y + 5z =O}. 


e If V = R° and Uis the plane {(x,y,z) € R®° : 2x + 3y + 5z = 0}, then U* is 
the line {(2t, 3t,5t) : t © R}. 


e More generally, if U is a plane in R° containing the origin, then U+ is the line 
containing the origin that is perpendicular to U. 


e If Uisa line in R° containing the origin, then Ut is the plane containing the 
origin that is perpendicular to U. 


e If V =F and U = {(a,b,0,0,0) € F° : a,b € F}, then 
Ut = {(0,0,x,y,z) EP: x,y,z € Fh. 
e Ifey,...,€4 fy, ++ f, is an orthonormal basis of V, then 


(span(e, ..-sn)) = span( fy... fy): 


We begin with some straightforward consequences of the definition. 


6.48 properties of orthogonal complement 


(a) If Uis a subset of V, then U+ is a subspace of V. 
(b) 40}>=V, 


(c) V+ = {0}. 
(d) If Wis a subset of V, then UM Ut C {0}. 
(e) If Gand H are subsets of V and G C H, then H+ C Gt. 
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Proof 
(a) Suppose U is a subset of V. Then (u,0) = 0 for every u € U; thus 0 € Ut. 
Suppose v,w € U+. If u € U, then 
(u,v + Ww) = (u,v) + (usw) =~0+0=0. 
Thus v + w € U+, which shows that U* is closed under addition. 
Similarly, suppose A € F and v € U+. If u € U, then 
(u, Av) = A(u,v) = A-0=0. 


Thus Av € U-, which shows that U* is closed under scalar multiplication. 
Thus U- is a subspace of V. 


(b 


wm 


Suppose that v € V. Then (0,v) = 0, which implies that v € {0}+. Thus 
{y= V. 


(c) Suppose that v € V+. Then (v,v) = 0, which implies that v = 0. Thus 
V~> = {0}. 


(d 


WN 


Suppose U is a subset of V and u € UN Ut. Then (u, u) = 0, which implies 
that u = 0. Thus Un Ut C {0}. 


(e) Suppose G and H are subsets of V and G C H. Suppose v € H+. Then 
(u,v) = 0 for every u € H, which implies that (u,v) = 0 for every u € G. 
Hence v € Gt. Thus H+ C Gt. 


Recall that if U and W are subspaces of V, then V is the direct sum of U and 
W (written V = U @ W) if each element of V can be written in exactly one way 
as a vector in U plus a vector in W (see 1.41). Furthermore, this happens if and 
only if V = U+ Wand UN W = {0} (see 1.46). 

The next result shows that every finite-dimensional subspace of V leads to a 
natural direct sum decomposition of V. See Exercise 16 for an example showing 
that the result below can fail without the hypothesis that the subspace U is finite- 
dimensional. 


6.49 direct sum of a subspace and its orthogonal complement 


Suppose U is a finite-dimensional subspace of V. Then 


V=UeU. 


Proof First we will show that 
V=U+U 


To do this, suppose that v € V. Let e;,...,e,,, be an orthonormal basis of U. We 
want to write v as the sum of a vector in U and a vector orthogonal to U. 
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We have 


6.50 VU = (VU, €y ey Hott FO, Cy Cy +U — (U, €y ey — 285 — (0, Cy Cm: 

Oa ma 
Let u and w be defined as in the equation above (as was done in the proof of 6.26). 
Because each e, € U, we see that u € U. Because e), ...,e,,, is an orthonormal 
list, for each k = 1,..., m we have 


(W, ek) = (0, &,) — (0, ex) 

= 0. 
Thus w is orthogonal to every vector in span(e}, ..., é,,,), Which shows that w € U~. 
Hence we have written v = u + w, where u € U and w € U+, completing the 
proof that V = U+ U-. 


From 6.48(d), we know that Um U+ = {0}. Now equation V = U + Ut 
implies that V = U @ Ut (see 1.46). 


Now we can see how to compute dim U+ from dim U. 


51 dimension of orthogonal complement 


Suppose V is finite-dimensional and U is a subspace of V. Then 


dim U+ = dim V — dim U. 


Proof The formula for dim U+ follows immediately from 6.49 and 3.94. 


The next result is an important consequence of 6.49. 


6.52 orthogonal complement of the orthogonal complement 


Suppose U is a finite-dimensional subspace of V. Then 


U = (Uu+)*. 


Proof First we will show that 
6.53 wou y, 


To do this, suppose u € U. Then (u,w) = 0 for every w € Ut (by the definition 
of U+). Because u is orthogonal to every vector in U+, we have u € (U*)*, 
completing the proof of 6.53. 

To prove the inclusion in the other direction, suppose v € (U+)*. By 6.49, 
we can write v = u + w, where u € Uand w € Ut. We have v — u = w € Ut. 
Because v € (U+)* and u € (U+)* (from 6.53), we have v — u € (U+)*. Thus 
v—u € Ut nA (U*)*, which implies that v — u = 0 [by 6.48(d)], which implies 
that v = u, which implies that v € U. Thus (U+)* C U, which along with 6.53 
completes the proof. 
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Suppose U is a subspace of V and 


Exercise 16(a) shows that the result 
we want to show that U = V. In some 


below is not true without the hypothesis 


situations, the easiest way to do this is to thar U is finite-dimensional. 

show that the only vector orthogonal to 

U is 0, and then use the result below. For example, the result below is useful for 
Exercise 4. 


6.54 Ut={0} — U=V (for Ua finite-dimensional subspace of V) 


Suppose U is a finite-dimensional subspace of V. Then 


US =o US V, 


Proof First suppose Ut = {0}. Then by 6.52, U = (U“)* = {0}+ = V,as 
desired. 

Conversely, if U = V, then Ut = V+ = {0} by 6.48(c). 

We now define an operator P,, for each finite-dimensional subspace U of V. 


6.55 definition: orthogonal projection, Pi; 


Suppose U is a finite-dimensional subspace of V. The orthogonal projection 


of V onto U is the operator P|, € £(V) defined as follows: For each v € V, 
write v = u + w, where u € U and w € UL. Then let Pv = u. 


The direct sum decomposition V = U @ U" given by 6.49 shows that each 
v € V can be uniquely written in the form v = u + w with u € U and w € Ut. 
Thus P,;v is well defined. See the figure that accompanies the proof of 6.61 for 
the picture describing P,;v that you should keep in mind. 


6.56 example: orthogonal projection onto one-dimensional subspace 


Suppose u € V with u # 0 and U is the one-dimensional subspace of V 
defined by U = span(u). 
If v & V, then 


(0, U) ( (v, u) ) 
= ut = U |, 
I|u\I? I|u\I? 
where the first term on the right is in span(u) (and thus is in U) and the second 
term on the right is orthogonal to u (and thus is in U+). Thus P,;v equals the first 
term on the right. In other words, we have the formula 


(v, u) 


0 = —>-u 
||u\| 


for every v € V. 
Taking v = u, the formula above becomes P,;u = u, as expected. Furthermore, 
taking v € {u}+, the formula above becomes P,,v = 0, also as expected. 
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6.57 properties of orthogonal projection Py; 


Suppose U is a finite-dimensional subspace of V. Then 
(a) Py € L(V); 
(b) Pu = u for every u € U; 


(c) Pw = 0 for every w € U!; 
(d) range P,, = U; 


(Ge) ie, — ee 
(f) 
(g) Pit = Pus 
(h 
(i) 


v — Pv € Ut for every v € V; 


[Pull < |l0l| for every v € V; 


if e,,...,e,, is an orthonormal basis of U and v € V, then 


Pio = (0, ey ey ++ + (U, Cy, Em 


Proof 


(a) 


(b) 


(c) 


(d 


wm 


(e) 


To show that P,, is a linear map on V, suppose v,,v. € V. Write 
Vy =U, +W, and vv =U+Wo 

with u,,u, € U and w,,w, € Ut. Thus Pv, = u, and Pv, = uy. Now 
U1, + Up = (Uy + Uy) + (Wy + W), 

where uw, + u» € Uand w, + w, € Ut. Thus 


Similarly, suppose A € F andv € V. Write v = u+w, where u € U 
and w € Ut. Then Av = Au + Aw with Au € U and Aw € Ut. Thus 
Py (Av) = Au = APyo. 

Hence P,, is a linear map from V to V. 


Suppose u € U. We can write u = u + 0, where u € U and 0 € Ut. Thus 
Pyu =u. 

Suppose w € U+. We can write w = 0 + w, where 0 € U and w € Ut". Thus 
Pyw = 0. 

The definition of P,; implies that range P,, C U. Furthermore, (b) implies 
that U C range P,;. Thus range P,; = U. 


The inclusion U+ C null P,, follows from (c). To prove the inclusion in the 
other direction, note that if v € null P,; then the decomposition given by 6.49 
must be v = 0 + v, where 0 € U andv € Ut. Thus null P,, C U+. 
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(f) Ifve Vandv =u+w with u € Uandw e€ U+, then 


v-Pywv=v-u=we ut. 
(g) Ifv € Vandv =u +w with u € Uandw € Ut’, then 
(P,2)0 = Py (P,70) = Pyu = u = Pyo. 
(h) Ifv € Vandv =u+w with u © Uand w € Ut, then 
|Pyoll? = lull? < (el? + eoll? = toll?, 
where the last equality comes from the Pythagorean theorem. 


(i) The formula for P,,v follows from equation 6.50 in the proof of 6.49. 


In the previous section we proved the Riesz representation theorem (6.42), 
whose key part states that every linear functional on a finite-dimensional inner 
product space is given by taking the inner product with some fixed vector. Seeing 
a different proof often provides new insight. Thus we now give a new proof of 
the key part of the Riesz representation theorem using orthogonal complements 
instead of orthonormal bases as in our previous proof. 

The restatement below of the Riesz representation theorem provides an iden- 
tification of V with V’. We will prove only the “onto” part of the result below 
because the more routine “one-to-one” part of the result can be proved as in 6.42. 

Intuition behind this new proof: If g € V’,v € V, and g(u) = (u,v) for all 
u & V, then v & (null g)+. However, (null g)+ is a one-dimensional subspace 
of V (except for the trivial case in which g = 0), as follows from 6.51 and 3.21. 
Thus we can obtain v be choosing any nonzero element of (null g)+ and then 
multiplying by an appropriate scalar, as is done in the proof below. 


6.58 Riesz representation theorem, revisited 


Suppose V is finite-dimensional. For each v € V, define y, € V’ by 


Qy(u) = (u,v) 


for each u € V. Then v & 9g, is a one-to-one function from V onto V’. 


Proof To show thatv + 9g, is surjective, Cgytion: The function v > 9, is a 
suppose gy € V’. If p = 0, then g = Gp. — Jinear mapping from V to V' if F=R. 
Thus assume g # 0. Hence null # V, However, this function is not linear if 
which implies that (null g)* # {0} (by F=C because 9,, = AQ, if AEC. 
6.49 with U = null 9). 

Let w € (null g)+ be such that w # 0. Let 


_ plw) 


= ——7 Ww. 
IIzoll? 


Then v € (null g)*. Also, v # 0 (because w € null g). 


6.59 
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Taking the norm of both sides of 6.59 gives 
6.60 \|v|| = ——. 


Applying ¢ to both sides of 6.59 and then using 6.60, we have 


_ Ip) 


= 2 
p(v) wie Iloll*. 


Now suppose u € V. Using the equation above, we have 


= (u = oo) PO) 
pv) lloll 


The first term in parentheses above is in null g and hence is orthogonal to v. Thus 
taking the inner product of both sides of the equation above with v shows that 


_ plu) 


Iloll? 


(u, V0) (U,V) = p(u). 


Thus 9 = @,, showing that v > 9g, is surjective, as desired. 


See Exercise 13 for yet another proof of the Riesz representation theorem. 


Minimization Problems 


The following problem often arises: The remarkable simplicity of the solu- 
Given a subspace U of V and a point tion to this minimization problem has 


v € V, find a point u © U such that — jeg 19 many important applications of 
|v — ul| is as small as possible. The next — inner product spaces outside of pure 
result shows that u = Pv is the unique — mathematics. 


solution of this minimization problem. 


6.61 minimizing distance to a subspace 
Suppose U is a finite-dimensional subspace of V, v € V, and u € U. Then 


lo — Pyll < llo — ull. 


Furthermore, the inequality above is an equality if and only if u = P,,v. 


Proof We have 
6.62 lv — Pyvl* < lo — Pyoll? + Po — ull 
= ||(v — Pyv) + (Pyo - u)|| 


2 
= |lo- ull, 
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where the first line above holds because 0 < ||Pyv — ull, 
the second line above comes from the Pythagorean the- 
orem [which applies because v — Pv € Ut by 6.57(f), 
and P,,v — u € U], and the third line above holds by 
simple algebra. Taking square roots gives the desired 
inequality. 

The inequality proved above is an equality if and 
only if 6.62 is an equality, which happens if and only if 
|P,;v — ul| = 0, which happens if and only if u = Pv. 0 


Py is the closest 
point in U to v. 


The last result is often combined with the formula 
6.57(i) to compute explicit solutions to minimization 
problems, as in the following example. 


6.63 example: using linear algebra to approximate the sine function 


Suppose we want to find a polynomial u with real coefficients and of degree 
at most 5 that approximates the sine function as well as possible on the interval 
[—7r, 7¢], in the sense that 


[" |sin x — u(x)? dx 


is as small as possible. 
Let C[—7, 7t] denote the real inner product space of continuous real-valued 
functions on [—71, 77] with inner product 


6.64 (fg) = [" fe. 


Let v € C[—7, 71] be the function defined by v(x) = sinx. Let U denote the 
subspace of C[—vz, 7] consisting of the polynomials with real coefficients and of 
degree at most 5. Our problem can now be reformulated as follows: 


Find u € U such that ||v — ull is as small as possible. 


To compute the solution to our ap- 
proximation problem, first apply the 
Gram-—Schmidt procedure (using the in- 
ner product given by 6.64) to the basis 1, x, x*, x°, x4, x° of U, producing an ortho- 
normal basis ¢1, 5, €3, €4, 5, €, of U. Then, again using the inner product given 
by 6.64, compute P,;v using 6.57(i) (with m = 6). Doing this computation shows 
that P,;v is the function u defined by 


A computer that can integrate is useful 
here. 


6.65 u(x) = 0.987862x — 0.155271x? + 0.00564312x°, 


where the 7z’s that appear in the exact answer have been replaced with a good 
decimal approximation. By 6.61, the polynomial u above is the best approximation 
to the sine function on [—7t, 7t] using polynomials of degree at most 5 (here “best 
approximation” means in the sense of minimizing [ae |sinx — u(x)|* dx). 
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To see how good this approximation is, the next figure shows the graphs of 
both the sine function and our approximation u given by 6.65 over the interval 
[— 7U, 70 ] . 


—1 


Graphs on [—71, 71] of the sine function (red) and its best 
jifth degree polynomial approximation u (blue) from 6.65. 


Our approximation 6.65 is so accurate that the two graphs are almost identical— 
our eyes may see only one graph! Here the red graph is placed almost exactly 
over the blue graph. If you are viewing this on an electronic device, enlarge the 
picture above by 400% near 7t or —7t to see a small gap between the two graphs. 

Another well-known approximation to the sine function by a polynomial of 
degree 5 is given by the Taylor polynomial p defined by 

gd 
6.66 p(x) =x-St+o. 


To see how good this approximation is, the next picture shows the graphs of both 
the sine function and the Taylor polynomial p over the interval [—7z, 7]. 


1 


-1 


Graphs on [—71, 7] of the sine function (red) 
and the Taylor polynomial (blue) from 6.66. 


The Taylor polynomial is an excellent approximation to sin x for x near 0. But 
the picture above shows that for |x| > 2, the Taylor polynomial is not so accurate, 
especially compared to 6.65. For example, taking x = 3, our approximation 6.65 
estimates sin3 with an error of approximately 0.001, but the Taylor series 6.66 
estimates sin3 with an error of approximately 0.4. Thus at x = 3, the error in 
the Taylor series is hundreds of times larger than the error given by 6.65. Linear 
algebra has helped us discover an approximation to the sine function that improves 
upon what we learned in calculus! 
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Pseudoinverse 


Suppose T € £(V, W) and b € W. Consider the problem of finding x € V such 
that 
Tx = b. 


For example, if V = F” and W = F” then the equation above could represent a 
system of m linear equations in n unknowns. 

If T is invertible, then the unique solution to the equation above is x = T~1b. 
However, if T is not invertible, then for some b € W there may not exist any 
solutions of the equation above, and for some b € W there may exist infinitely 
many solutions of the equation above. 

If T is not invertible, then we can still try to do as well as possible with the 
equation above. For example, if the equation above has no solutions, then instead 
of solving the equation Tx — b = 0, we can try to find x € V such that ||Tx — b|| 
is as small as possible. As another example, if the equation above has infinitely 
many solutions x € V, then among all those solutions we can try to find one such 
that ||x|| is as small as possible. 

The pseudoinverse will provide the tool to solve the equation above as well 
as possible, even when T is not invertible. We need the next result to define the 
pseudoinverse. 

In the next two proofs, we will use without further comment the result that if 
V is finite-dimensional and T € L(V, W), then null T, (null T)+, and range T are 
all finite-dimensional. 


6.67 restriction of a linear map to obtain a one-to-one and onto map 


Suppose V is finite-dimensional and T € £(V,W). Then T| ny 7)1 is a one- 
to-one map of (null T)+ onto range T. 


Proof Suppose that v € (null T)* and T\(q47)20 = 0. Hence Tv = 0 and 
thus v € (nullT) M (null T)+, which implies that v = 0 [by 6.48(d)]. Hence 
null T| yun 7) = {0}, which implies that T|(, 47) is injective, as desired. 

Clearly range T| (yu). G range T. To prove the inclusion in the other direction, 
suppose w € range T. Hence there exists v € V such that w = Tv. There exist 
u € null T and x € (null T)+ such that v = u + x (by 6.49). Now 


Thauit)-X = Tx = To-Tu=w-O0=4, 


which shows that w € range T|qy7)1- Hence range T C range T|(,47)1, complet- 
ing the proof that range T| (jy 7). = range T. 


Now we can define the pseudoinverse 
Tt (pronounced “T dagger’) of a linear 
map T. In the next definition (and from 
now on), think of T|(,y1)7)1 a8 an invertible linear map from (null T)* onto range T, 
as is justified by the result above. 


To produce the pseudoinverse notation 
T’ in TfX, type T*\dagger. 
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6.68 definition: pseudoinverse, T* 


Suppose that V is finite-dimensional and T € L(V, W). The pseudoinverse 
Tt € L(W,V) of Tis the linear map from W to V defined by 


Thw = (T\enunt)+) Prange T w 


for each w € W. 


Recall that Pinger W = 0 if w € (range T)+ and Prange © = w if w € range T. 


Thus if w € (range T)+, then Tlw = 0, and if w € range T, then Tw is the 
unique element of (null T)+ such that T(T'w) = w. 
The pseudoinverse behaves much like an inverse, as we will see. 


6.69 algebraic properties of the pseudoinverse 


Suppose V is finite-dimensional and T € £(V,W). 
(a) If T is invertible, then T? = T-1. 


(b) TT? = PrangeT = the orthogonal projection of W onto range T. 


(c) TIT = Piyuit)s = the orthogonal projection of V onto (null T)+. 


Proof 


(a) Suppose T is invertible. Then (nullT)t = V and rangeT = W. Thus 
Tl muntyt = T and Prange is the identity operator on W. Hence Tt =T-h 


(b) Suppose w € range T. Thus 


TT hw = TT uur) 1w = W = Panget W. 


If w € (range T)+, then T'w = 0. Hence TTtw = 0 = Prange W. Thus TTT 
and Prange agree on range T and on (range T)+. Hence these two linear maps 
are equal (by 6.49). 
(c) Suppose v € (null T)+. Because Tv € range T, the definition of T’ shows 
that 
Th (To) = (Tcaxnry2) (10) = 0 = Pe'r51?- 


If v € nullT, then T'Tv = 0 = Pout)+0. Thus T'T and PoutTy+ agree on 
(null T)+ and on null T. Hence these two linear maps are equal (by 6.49). 


Suppose that T € L(V,W). If T is 
surjective, then TT™ is the identity opera- 
tor on W, as follows from (b) in the result 
above. If T is injective, then T'T is the identity operator on V, as follows from (c) 
in the result above. For additional algebraic properties of the pseudoinverse, see 
Exercises 19—23. 


The pseudoinverse is also called the 
Moore-Penrose inverse. 
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Suppose T € L(V,W), b € W, and we want to find x € V that solves the 
equation 
Tx = b. 


If T is invertible, then x = T~‘D is the unique solution. If T is not invertible, then 
T~! is not defined. However, the pseudoinverse T* is defined. Taking x = T'b 
makes Tx as close to b as possible, as shown by (a) of the next result. Thus the 
pseudoinverse provides what is called a best fit to the equation above. 

Among all vectors x € V that make Tx as close as possible to b, the vector 
T*b has the smallest norm, as shown by combining (b) in the next result with the 
condition for equality in (a). 


6.70 pseudoinverse provides best approximate solution or best solution 


Suppose V is finite-dimensional, T € L(V, W), and b € W. 


(a) Ifx © V, then 
TT Tb) — bl] < ITx — bh, 


with equality if and only if x © T'b + null T. 
(b) If x € T‘b + nullT, then 


ITT] < ix, 


with equality if and only if x = Tip. 


Proof 
(a) Suppose x € V. Then 
Tx —b = (Tx —TT*b) + (TTtb —b). 


The first term in parentheses above is in range T. Because the operator TT? 
is the orthogonal projection of W onto range T [by 6.69(b)], the second term 
in parentheses above is in (range T)+ [see 6.57(f)]. 


Thus the Pythagorean theorem implies the desired inequality that the norm of 
the second term in parentheses above is less than or equal to ||Tx — b||, with 
equality if and only if the first term in parentheses above equals 0. Hence 
we have equality if and only if x — Tb € null T, which is equivalent to the 
statement that x € T'b + null T, completing the proof of (a). 


(b) Suppose x € T'b + null T. Hence x — Ttb € null T. Now 


wm 


x = (x—Ttb) +Ttb. 


The definition of T* implies that T*b € (null T)4. Thus the Pythagorean 
theorem implies that \|T* b|| < |Ixll, with equality if and only if x = Tb. 


A formula for T* will be given in the next chapter (see 7.78). 
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6.71 example: pseudoinverse of a linear map from F* to F° 


Suppose T € £(F*, F°) is defined by 
T(a,b,c,d) = (a+b+c,2c+d,0). 


This linear map is neither injective nor surjective, but we can compute its pseudo- 
inverse. To do this, first note that range T = {(x, y,0) : x,y € F}. Thus 


Prange T (% Ys Z) = (x,y, 0) 
for each (x,y,z) € F°. Also, 
null T = {(a,b,c,d) € F*:a+b+c =Oand 2c +d = 0}. 


The list (—1, 1, 0,0), (—1, 0,1, —2) of two vectors in null T spans null T because 
if (a,b,c,d) € nullT then 


(a, b,c,d) = b(—1,1,0,0) + c(—1,0,1, —2). 


Because the list (—1, 1, 0,0), (—1, 0,1, —2) is linearly independent, this list is a 
basis of null T. 
Now suppose (x,y,z) € F°, Then 


6.72 Ti (x,y,z) = (T\nunt)+) "Prange (Xs Ys 2) = (Tunry+)7*(% Y, 0). 


The right side of the equation above is the vector (a,b,c,d) € F* such that 
T(a,b,c,d) = (x,y, 0) and (a,b, c,d) € (null T)+. In other words, a, b, c, d must 
satisfy the following equations: 


at+b+c=x 


2c+d=y 
—a+b=0 
—a+c—2d=0, 


where the first two equations are equivalent to the equation T (a, b,c,d) = (x, y, 0) 
and the last two equations come from the condition for (a, b, c,d) to be orthogo- 
nal to each of the basis vectors (—1, 1,0,0), (—1, 0, 1, —2) in this basis of null T. 
Thinking of x and y as constants and a, b, c, d as unknowns, we can solve the 
system above of four equations in four unknowns, getting 
a= a (5x —2y), b= a (5x —2y),c= a(x+ 4y), d= a (-2x + 3y). 
Hence 6.72 tells us that 
Th (x, y,Z) = 7 (bx — 2y, 5x — 2y,x + 4y, —2x + 3y). 


The formula above for T* shows that TT* (x,y,z) = (x,y, 0) for all (x, y,z) € F° 
which illustrates the equation TTt = Pranger from 6.69(b). 
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Exercises 6C 


1 Suppose 7,...,v,, € V. Prove that 
{Oy, e005 Ug }> = (span(V1, 0) 
2 Suppose U is a subspace of V with basis uy, ..., u,,, and 


Uys ier Uys Opses Oy 


is a basis of V. Prove that if the Gram—Schmidt procedure is applied to the 
basis of V above, producing a list e,,...,¢,,, ft, f,, then e;,...,e,, is an 
orthonormal basis of U and f,,..., f;, is an orthonormal basis of Ut. 


3 Suppose U is the subspace of R* defined by 
U = span((1, 2, 3, —4), (—5, 4, 3, 2)). 
Find an orthonormal basis of U and an orthonormal basis of Ut. 


4 Suppose e,,...,e,, is a list of vectors in V with |le,|| = 1 for each k = 1,...,n 
and 
2 2 
lloll? = Ko, ey)" + + + Ko, e,| 


for all v € V. Prove that e,,...,e,, is an orthonormal basis of V. 


This exercise provides a converse to 6.30(b). 


5 Suppose that V is finite-dimensional and U is a subspace of V. Show that 
Py. = I — Py, where J is the identity operator on V. 


6 Suppose V is finite-dimensional and T € £(V,W). Show that 
T= TPT) = Pranget ls 


7 Suppose that X and Y are finite-dimensional subspaces of V. Prove that 
PyPy = O if and only if (x, y) = 0 for allx € X andally € Y. 


8 Suppose U is a finite-dimensional subspace of V and v € V. Define a linear 
functional gy: U > F by 
plu) = (u, 0) 


for all u € U. By the Riesz representation theorem (6.42) as applied to the 
inner product space U, there exists a unique vector w € U such that 


plu) = {u,w) 
for all u € U. Show that w = P,v. 


9 Suppose V is finite-dimensional. Suppose P € L(V) is such that P? = P 
and every vector in null P is orthogonal to every vector in range P. Prove 
that there exists a subspace U of V such that P = P,;. 


10 


11 


12 


13 


14 


15 


16 
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Suppose V is finite-dimensional and P € L(V) is such that P? = P and 
Poll < lll 
for every v € V. Prove that there exists a subspace U of V such that P = P,;. 
Suppose T € £(V) and U is a finite-dimensional subspace of V. Prove that 
U is invariant under T <> Pj TPy = TP. 


Suppose V is finite-dimensional, T € £(V), and U is a subspace of V. Prove 
that 
U and U*+ are both invariant under T —> PT = TP. 


Suppose F = R and V is finite-dimensional. For each v € V, let y,, denote 
the linear functional on V defined by 
Py(u) = (u,v) 


for allu € V. 


(a) Show that v + g, is an injective linear map from V to V’. 
(b) Use (a) and a dimension-counting argument to show that v & @, is an 
isomorphism from V onto V’, 


The purpose of this exercise is to give an alternative proof of the Riesz 
representation theorem (6.42 and 6.58) when F = R. Thus you should not 
use the Riesz representation theorem as a tool in your solution. 


Suppose that e;, ...,e,, is an orthonormal basis of V. Explain why the dual 
basis (see 3.112) of e1,...,e,, iS e,, ..., €,, under the identification of V’ with 
V provided by the Riesz representation theorem (6.58). 


In R4, let 
U = span((1,1, 0,0), (1,1,1, 2)). 


Find u € U such that ||u — (1, 2, 3, 4)|| is as small as possible. 


Suppose C[—1, 1] is the vector space of continuous real-valued functions 
on the interval [—1, 1] with inner product given by 


1 
(f8) =i fg 
-1 
for all f,¢ € C[—1, 1]. Let U be the subspace of C[—1, 1] defined by 
U = {f © C[-1,1] : f(0) = 0}. 


(a) Show that Ut = {0}. 
(b) Show that 6.49 and 6.52 do not hold without the finite-dimensional 
hypothesis. 
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1 
17 Find p € P,(R) such that p(0) = 0, p'(0) = 0, and { 2 + 3x — p(x) |” dx is 
0 


as small as possible. 


18 Find p € ?;(R) that makes \ |sin x — p(x)|° dx as small as possible. 


The polynomial 6.65 is an excellent approximation to the answer to this 
exercise, but here you are asked to find the exact solution, which involves 
powers of 7t. A computer that can perform symbolic integration should 
help. 


19 Suppose V is finite-dimensional and P € £(V) is an orthogonal projection 
of V onto some subspace of V. Prove that pPi=P. 


20 Suppose V is finite-dimensional and T € £(V,W). Show that 
nullT? = (rangeT)+ and range T* = (null T)+. 
21 Suppose T € £(F°, F’) is defined by 
T(a,b,c) = (a+b+c,2b+ 3c). 


(a) For (x,y) € F? find a formula for Tt (x,y). 

(b) Verify that the equation TT? = rangeT from 6.69(b) holds with the 
formula for T* obtained in (a). 

(c) Verify that the equation T'T = Pout) from 6.69(c) holds with the 
formula for T* obtained in (a). 


22 Suppose V is finite-dimensional and T € £(V, W). Prove that 
ITS) and TT ST", 


Both formulas above clearly hold if T is invertible because in that case we 
can replace T' with T~ 


23 Suppose V and W are finite-dimensional and T € £(V, W). Prove that 


(Tt) =T., 


The equation above is analogous to the equation ae = T that holds if 
T is invertible. 
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Chapter 7 | Sis 
Operators on Inner Product Spaces 


The deepest results related to inner product spaces deal with the subject to which 
we now turn—linear maps and operators on inner product spaces. As we will see, 
good theorems can be proved by exploiting properties of the adjoint. 

The hugely important spectral theorem will provide a complete description 
of self-adjoint operators on real inner product spaces and of normal operators 
on complex inner product spaces. We will then use the spectral theorem to help 
understand positive operators and unitary operators, which will lead to unitary 
matrices and matrix factorizations. The spectral theorem will also lead to the 
popular singular value decomposition, which will lead to the polar decomposition. 

The most important results in the rest of this book are valid only in finite 
dimensions. Thus from now on we assume that V and W are finite-dimensional. 


e F denotes R or C. 
e Vand W are nonzero finite-dimensional inner product spaces over F. 
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7A Self-Adjoint and Normal Operators 
Adjoints 


7.1. definition: adjoint, T* 


Suppose T € £(V,W). The adjoint of T is the function T*: W > V such 
that 


(TO) = ol 1 on) 


for every v € V and every w € W. 


To see why the definition above The word adjoint has another meaning 
makes sense, suppose T € £(V, W). Fix in linear algebra. In case you en- 


w € W. Consider the linear functional counter the second meaning elsewhere, 


be warned that the two meanings for 


v & (Tv, w a 
Cen) adjoint are unrelated to each other. 


on V that maps v € V to (Tv, w); this 

linear functional depends on T and w. By the Riesz representation theorem (6.42), 
there exists a unique vector in V such that this linear functional is given by taking 
the inner product with it. We call this unique vector T*w. In other words, T*w is 
the unique vector in V such that 


(Tv, w) = (v, T*w) 


for every v € V. 

In the equation above, the inner product on the left takes place in W and the 
inner product on the right takes place in V. However, we use the same notation 
(-,-) for both inner products. 


7.2 example: adjoint of a linear map from R° to R* 
Define T: R? > R? by 
T(X4, Xo, X3) = (Xp + 3X3, 2X1). 
To compute T*, suppose (x1, %2,x3) € R° and (y,,y>) € R2 Then 
(T (X1,%2,%3). (Yas Y2)) = ((Xz + 3x3, 2X1), (Yi. Y2)) 
= Xzyy + 3x3Y1 + 2X1Y2 
= (x4; X95 X3), (2Y>, yy; 3y1)). 


The equation above and the definition of the adjoint imply that 


T* (Yt, Yo) = (2Y2,Y1.3Y1)- 
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7.3 example: adjoint of a linear map with range of dimension at most 1 


Fix u € Vandx € W. Define T € L(V, W) by 
Tv = (v,u)x 
for each v € V. To compute T*, suppose v € V and w € W. Then 
(Tv, w) = (0, u)x, w) 
= (0, U)(x, W) 
= (v, (w,x)u). 
Thus 


T*w = (w,x)u. 


ok 
In the two examples above, T™ turned The two examples above and the proof 
out to be not just a function from V to pejow use a common technique for 


W but a linear map from V to W. This computing T*: start with a formula 

behavior is true in general, as shown by for (Tv, w) then manipulate it to get 

the next result. just v in the first slot; the entry in the 
second slot will then be T*w. 


7.4 adjoint of a linear map is a linear map 


IfT € L(V,W), then T* € L(W,V). 


Proof Suppose T € £(V,W). If v € Vand w,,w, € W, then 
(Tv, W, + W2) = (Tv, w1) + (Tv, Wo) 
= (v,T*w,) + (v, T*w) 
= (v,T*w, + T*w,). 
The equation above shows that 
T*(w, + Wy) = T*w, + T*Wo. 
Ifve V,A EF, andw € W, then 
(Tv, Aw) = A(Tv, w) 
= Mv, T*w) 
= (v, AT*w). 
The equation above shows that 
T*(Aw) = AT*w. 


Thus T* is a linear map, as desired. 
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7.5 properties of the adjoint 


Suppose T € £(V,W). Then 
(a) (S+T)* = S*+T* forallS € L(V,W); 
(b) (AT)* = AT* for all A EF; 


(Cy i) =e 


(d) (ST)* = T*S* for all S € L(W, U) (here U is a finite-dimensional inner 
product space over F); 


(e) I* = I, where I is the identity operator on V; 


(f) if Tis invertible, then T* is invertible and (T*)~' = (T~1)*, 


Proof Suppose v € Vandw ce W. 
(a) IfS € L(V, W), then 


((S + T)v, w) = (Sv, w) + (Tv, w) 
= (0, 5*w) + (0, T*w) 
= (0, S*w + T*w). 


Thus (S + T)*w = S*w + T*w, as desired. 


(b) If A € F, then 
((AT)o, w) = A(Tv,w) = A(v, T*w) = (v, AT*w). 
Thus (AT)*w = AT*w, as desired. 
(c) We have a 
(T*w,v) = (v, T*w) = (Tv, w) = (w, Td). 
Thus (T*)*v = Tv, as desired. 
(d) Suppose S € Z(W, U) and u € U. Then 


((ST)v, u) = (S(Tv), u) = (To, S*u) = (v, T*(S*u)). 
Thus (ST)*u = T*(S*u), as desired. 
(e) Suppose u € V. Then 
(Iu, v) = (u,v). 
Thus I*v = v, as desired. 


(f) Suppose T is invertible. Take adjoints of both sides of the equation T~'T = I, 
then use (d) and (e) to show that rr ty = I. Similarly, the equation 
TT-! = I implies (T~!)*T* = I. Thus (T~')* is the inverse of T*, as 
desired. 


If F = R, then the map T + T* is a linear map from L(V, W) to Z(W, V), 
as follows from (a) and (b) of the result above. However, if F = C, then this map 
is not linear because of the complex conjugate that appears in (b). 
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The next result shows the relationship between the null space and the range of 
a linear map and its adjoint. 


7.6 null space and range of T* 


Suppose T € £(V,W). Then 
(a) null T* = (range T)-; 


(b) range T* = (null T)-; 
(c) nullT = (rangeT*); 
(d) range T = (nullT*)~. 


Proof We begin by proving (a). Let w € W. Then 
wenullT* — T*w=0 
= (v,T*w) = 0 forallu Ee V 
= (Tv,w) = 0 forallv € V 
= w € (rangeT)t. 


Thus null T* = (range T)+, proving (a). 

If we take the orthogonal complement of both sides of (a), we get (d), where 
we have used 6.52. Replacing T with T* in (a) gives (c), where we have used 
7.5(c). Finally, replacing T with T* in (d) gives (b). 


As we will soon see, the next definition is intimately connected to the matrix 
of the adjoint of a linear map. 


7.7. definition: conjugate transpose, A* 


The conjugate transpose of an m-by-n matrix A is the n-by-m matrix A* 
obtained by interchanging the rows and columns and then taking the complex 
conjugate of each entry. In other words, if j € {1,...,n} andk € {1,..., m}, 
then 


Ce Oe, 


7.8 example: conjugate transpose of a 2-by-3 matrix 


The conjugate transpose of the 2-by-3 
2 34+4i 7 
6 5 8i 


If a matrix A has only real entries, 
) is the 3-by-2 then A* = A‘, where A‘ denotes the 

transpose of A (the matrix obtained 
by interchanging the rows and the 


2 6 columns). 
3-41 5 : 


7 —8i 


matrix ( 


matrix 
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The next result shows how tocompute 7p, adjoint of a linear map does not 
the matrix of T* from the matrix of T. depend on a choice of basis. Thus 
Caution: With respect to nonorthonor- we frequently emphasize adjoints of 
mal bases, the matrix of T* does not nec- —_Jinear maps instead of transposes or 
essarily equal the conjugate transpose of — conjugate transposes of matrices. 
the matrix of T. 


7.9 matrix of T* equals conjugate transpose of matrix of T 


Let T € £(V,W). Suppose e,,...,e, is an orthonormal basis of V and 


fi, +++» fin iS an orthonormal basis of W. Then M(T™*, (fq, ---s finds (C15 «+2 End) 
is the conjugate transpose of M(T, (€1,..-.€;). (f1. +++» fin))- In other words, 


DUES) = (OOO 


Proof In this proof, we will write W(T) and M(T*) instead of the longer 


expressions M(T, (€1,---5€n)> (fis ---> fin)) and M(T*, (fy -+2 finds (Crs +0 €n))- 
Recall that we obtain the k" column of (T) by writing Te, as a linear 
combination of the fs the scalars used in this linear combination then become 


the k column of (T). Because fis fin is an orthonormal basis of W, we 
know how to write Te, as a linear combination of the f;’s [see 6.30(a)]: 


Tey, = (Teg, fifa + + + (Tes fin) Fin 
Thus 
the entry in row j, column k, of M(T) is (Te,, ff). 

In the statement above, replace T with T* and interchange e,,...,e,, and 
fis» fm This shows that the entry in row j, column k, of M(T*) is (T*f,,e;), 
which equals ¢ f,, Te;), which equals (Te;, f.), Which equals the complex conjugate 
of the entry in row k, column j, of M(T). Thus (T*) = (M(T))’. 


The Riesz representation theorem as stated in 6.58 provides an identification of 
V with its dual space V’ defined in 3.110. Under this identification, the orthogonal 
complement U+ of a subset U C V corresponds to the annihilator U° of U. If U 
is a subspace of V, then the formulas for the dimensions of Ut and U° become 
identical under this identification—see 3.125 and 6.51. 

Suppose T: V > Wis a linear map. Because orthogonal complements and 
Under the identification of V with V’ and adjoints are easier to deal with than 
the identification of W with W’, the ad- gyn ihilators and dual maps, there is 
joint map T*: W — V corresponds to no need to work with annihilators 
the dual map T’: W’ — V’ defined in and dual maps in the context of inner 
3.118, as Exercise 32 asks you to verify. product spaces. 

Under this identification, the formulas for 

null T* and range T* [7.6(a) and (b)] then become identical to the formulas for 
null T’ and range T’ [3.128(a) and 3.130(b)]. Furthermore, the theorem about the 
matrix of T* (7.9) is analogous to the theorem about the matrix of T’ (3.132). 
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Self-Adjoint Operators 


Now we switch our attention to operators on inner product spaces. Instead of 
considering linear maps from V to W, we will focus on linear maps from V to V; 
recall that such linear maps are called operators. 


710 definition: self-adjoint 


An operator T € £(V) is called self-adjoint if T = T*. 


If T € L(V) and e;,...,e,, is an orthonormal basis of V, then T is self-adjoint 
if and only if M(T, (e,,...,e,)) = M(T, (e,.-- ey). as follows from 7.9. 


7.11. example: determining whether T is self-adjoint from its matrix 


Suppose c € F and T is the operator on F* whose matrix (with respect to the 


standard basis) is 7 
c 
ney =(2£) 


The matrix of T* (with respect to the standard basis) is 


mera (23) 


Thus M(T) = M(T*) if and only if c = 3. Hence the operator T is self-adjoint 
if and only if c = 3. 


A good analogy to keep in mind is that the adjoint on £(V) plays a role similar 
to that of the complex conjugate on C. A complex number z is real if and only if 
z = Z; thus a self-adjoint operator (T = T*) is analogous to a real number. 

We will see that the analogy discussed 4, operator T © L(V) is self-adjoint 
above is reflected in some important Prop- jf and only if 
erties of self-adjoint operators, beginning 
with eigenvalues in the next result. 

If F = R, then by definition every forallv,w EV. 
eigenvalue is real, so the next result is 
interesting only when F = C. 


(Tv, w) = (v, Tw) 


7.12 eigenvalues of self-adjoint operators 


Every eigenvalue of a self-adjoint operator is real. 


Proof Suppose T is a self-adjoint operator on V. Let A be an eigenvalue of T, 
and let v be a nonzero vector in V such that Tv = Av. Then 


Aull? = (Av, v) = (To, v) = (v, Tv) = (0, Av) = Aol 


Thus A = A, which means that A is real, as desired. 
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The next result is false for real inner product spaces. As an example, consider 
the operator T € L(R7) that is a counterclockwise rotation of 90° around the 
origin; thus T(x, y) = (—y,x). Notice that Tv is orthogonal to v for every v € R2 
even though T # 0. 


7.13 Tvis orthogonal to v for allvu <—» T = 0 (assuming F = C) 


Suppose V is a complex inner product space and T € £(V). Then 


(Tv, v) = 0 foreveryv EV <= T=0. 


Proof Ifu,w «€ V, then 
(Tut+w),u+w)—(Tu—w),u-—w) 
4 
rn (Tu + iw),u + iw) —(T(u —iw),u— iw) 
mn , 
as can be verified by computing the right side. Note that each term on the right 
side is of the form (Tv, v) for appropriate v € V. 
Now suppose (Tv, v) = 0 for every v € V. Then the equation above implies 
that (Tu, w) = 0 for all u, w € V, which then implies that Tu = 0 for every u € U 
(take w = Tu). Hence T = 0, as desired. 


(Tu, w) = 


The next result is false for real inner 


ae The next result provides another good 
product spaces, as shown by considering 


example of how self-adjoint operators 


any operator on arealinner productspace —penave like real numbers. 
that is not self-adjoint. 


7.14 (Tov,0) is real for allo <= T is self-adjoint (assuming F = C) 


Suppose V is a complex inner product space and T € £(V). Then 


T is self-adjoint <—» (Tv,v) € R for every v € V. 


Proof Ifve V, then 
7.15 (T*v,v) = (v,T*v) = (To, 0). 
Now 
T is self-adjoint —» T—T* =0 
<= ((T—T*)v,v) = 0 for every v EV 
<> (Tv,v) — (Tv, v) = 0 for every v € V 
<= (Tv,v) ER for every v € V, 


where the second equivalence follows from 7.13 as applied to T — T* and the 
third equivalence follows from 7.15. 
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On areal inner product space V, a nonzero operator T might satisfy (Tv, v) = 0 
for all v € V. However, the next result shows that this cannot happen for a self- 
adjoint operator. 


7.16 T self-adjoint and (Tv,v) = Oforallvu <— T=0 


Suppose T is a self-adjoint operator on V. Then 


(Tv, v) = 0 foreveryv EV <= T=0. 


Proof We have already proved this (without the hypothesis that T is self-adjoint) 

when V is a complex inner product space (see 7.13). Thus we can assume that V 

is areal inner product space. If u,w © V, then 
(Tut+w),ut+w)—(T(u-—w),u—w) 


7.17 (Tu, w) = ———>—_{_. 


as can be proved by computing the right side using the equation 
(Tw, u) = (w,Tu) = (Tu, w), 


where the first equality holds because T is self-adjoint and the second equality 
holds because we are working in a real inner product space. 

Now suppose (Tv, v) = 0 for every v € V. Because each term on the right 
side of 7.17 is of the form (Tv, v) for appropriate v, this implies that (Tu, w) = 0 
for all u,w € V. This implies that Tu = 0 for every u € V (take w = Tu). Hence 
T = 0, as desired. 


Normal Operators 


7.18 definition: normal 


e An operator on an inner product space is called normal if it commutes with 
its adjoint. 


e In other words, T € Z(V) is normal if TT* = T*T. 


Every self-adjoint operator is normal, because if T is self-adjoint then T* = T 
and hence T commutes with T*. 


7.19 example: an operator that is normal but not self-adjoint 


Let T be the operator on F? whose matrix (with respect to the standard basis) 


(3 7} 


Thus T(w,z) = (2w — 3z,3w + 2z). 


is 
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This operator T is not self-adjoint because the entry in row 2, column 1 (which 
equals 3) does not equal the complex conjugate of the entry in row 1, column 2 
(which equals —3). 

The matrix of TT* equals 


2 -3 2 3 : 13 0 
(; > )( 3 ol which equals ( 0 13 ) 


Similarly, the matrix of T*T equals 


2 3 2 -3 : 13. 0 
(2, ale > ) which equals ( 0 13 ) 


Because TT* and T*T have the same matrix, we see that TT* = T*T. Thus T is 
normal. 


In the next section we will see why normal operators are worthy of special 
attention. The next result provides a useful characterization of normal operators. 


7.20 T is normal if and only if Tv and T*v have the same norm 


Suppose T € £(V). Then 


Tis normal <> ||To|| = ||T*v|| for every v € V. 


Proof We have 
Tisnormal = T*T-—TT* =0 

= ((T*T —TT*)v,v) = 0 for every v € V 
<=> (T*Tv,v) = (TT*v, 0) for every v € V 
<= (Tv, Tv) = (T*v, T*v) for every v € V 
«=> |Tol? = ||T*o|* for every v E V 
<=> ||To|| = ||T*o|| for every v & V, 

where we used 7.16 to establish the second equivalence (note that the operator 


T*T — TT* is self-adjoint). 


The next result presents several consequences of the result above. Compare 
(e) of the next result to Exercise 3. That exercise states that the eigenvalues of 
the adjoint of each operator are equal (as a set) to the complex conjugates of 
the eigenvalues of the operator. The exercise says nothing about eigenvectors, 
because an operator and its adjoint may have different eigenvectors. However, 
(e) of the next result implies that a normal operator and its adjoint have the same 
eigenvectors. 
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7.21 range, null space, and eigenvectors of a normal operator 


Suppose T € £(V) is normal. Then 
(a) nullT = nullT*; 


(b) range T = range T*; 


(c) V = nullT @ range T; 
(d) T — Alis normal for every A € F; 
(e) ifv € Vand A € F, then Tv = Avif and only if T*v = Av. 


Proof 
(a) Suppose v € V. Then 


venulT <= |Tol =0 — |T*o] =0 — ve null T*, 


where the middle equivalence above follows from 7.20. Thus null T = null T*. 
(b) We have 
range T = (null T*)* = (null T)+ = range T*, 
where the first equality comes from 7.6(d), the second equality comes from 


(a) in this result, and the third equality comes from 7.6(b). 
(c) We have 


V = (nullT) @ (nullT)+ = null T @ range T* = null T @ range T, 


where the first equality comes from 6.49, the second equality comes from 
7.6(b), and the third equality comes from (b) in this result. 


(d 


wm 


Suppose A € F. Then 
(T — AI)(T — Al)* = (T — AD(T* -—AD 
= TT* — AT — AT* + |API 
= T*T — AT — AT* + |API 
= (T* —AD(T—AD 
= (T—Al)*(T—AD. 


Thus T — AI commutes with its adjoint. Hence T — AI is normal. 


(e) Suppose v € V and A € F. Then (d) and 7.20 imply that 
I(T — Al)ol| = |\(T — AD*ol = \(T* — AD)ol. 


Thus ||(T — AI)o|| = 0 if and only if ||(T* — AD)o|| = 0. Hence Tv = Av if 
and only if T*v = Av. 
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Because every self-adjoint operator is normal, the next result applies in partic- 
ular to self-adjoint operators. 


7.22 orthogonal eigenvectors for normal operators 


Suppose T € L(V) is normal. Then eigenvectors of T corresponding to 
distinct eigenvalues are orthogonal. 


Proof Suppose «, 6 are distinct eigenvalues of T, with corresponding eigen- 
vectors u,v. Thus Tu = au and Tv = fv. From 7.21(e) we have T*v = fv. 
Thus 


(a — B)(u,v) = (au, v) — (u, Bo) 
= (Tu,v) — (u, T*v) 
-— 0. 


Because a # f, the equation above implies that (u,v) = 0. Thus u and v are 
orthogonal, as desired. 


As stated here, the next result makes sense only when F = C. However, see 
Exercise 12 for a version that makes sense when F = C and when F = R. 

Suppose F = C and T € £(V). Under the analogy between £(V) and C, 
with the adjoint on £(V) playing a similar role to that of the complex conjugate on 
C, the operators A and B as defined by 7.24 correspond to the real and imaginary 
parts of T. Thus the informal title of the result below should make sense. 


7.23 Tisnormal <= the real and imaginary parts of T commute 


Suppose F = C andT € £(V). Then T is normal if and only if there exist 
commuting self-adjoint operators A and B such that T = A + iB. 


Proof First suppose T is normal. Let 


T+T* T=T* 
7.24 A= and B= - 
2i 
Then A and B are self-adjoint and T = A + iB. A quick computation shows that 
T*T —TT* 
7.25 AB —BA= = 


Because T is normal, the right side of the equation above equals 0. Thus the 
operators A and B commute, as desired. 

To prove the implication in the other direction, now suppose there exist com- 
muting self-adjoint operators A and B such that T = A + iB. Then T* = A — iB. 
Adding the last two equations and then dividing by 2 produces the equation for A 
in 7.24. Subtracting the last two equations and then dividing by 2i produces the 
equation for B in 7.24. Now 7.24 implies 7.25. Because B and A commute, 7.25 
implies that T is normal, as desired. 
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Exercises 7A 


1 


10 


Suppose 7 is a positive integer. Define T € L(F”) by 
T (24, +0524) = (0,24, 0065 21) 
Find a formula for T* (Zz, ..., Z,,)- 
Suppose T € £(V,W). Prove that 
T=0 = T*=0 = T*T=0 & TT*=0. 


Suppose T € £(V) and A € F. Prove that 

A is an eigenvalue of T <> A is an eigenvalue of T*. 
Suppose T € £(V) and U is a subspace of V. Prove that 

U is invariant under T <> U? is invariant under T*. 


Suppose T € £(V,W). Suppose e;,...,e,, is an orthonormal basis of V and 
fis ++ fin is an orthonormal basis of W. Prove that 


Te |2 ++ + Teg? = IT*AIP + + IT *Full” 


The numbers Te, 17, sia5 \|Te,,lI7 in the equation above depend on the ortho- 
normal basis €,, ...,€,, but the right side of the equation does not depend on 
€15+++5€,. Thus the equation above shows that the sum on the left side does 
not depend on which orthonormal basis e,,...,e, is used. 


Suppose T € £(V,W). Prove that 

(a) Tis injective <=» T* is surjective; 

(b) Tis surjective <>» T* is injective. 

Prove that if T € L(V, W), then 

(a) dimnull T* = dimnullT + dim W — dim V; 
(b) dim range T* = dim range T. 


Suppose A is an m-by-n matrix with entries in F. Use (b) in Exercise 7 to 
prove that the row rank of A equals the column rank of A. 


This exercise asks for yet another alternative proof of a result that was 
previously proved in 3.57 and 3.133. 


Prove that the product of two self-adjoint operators on V is self-adjoint if 
and only if the two operators commute. 


Suppose F = C and T € Z£(V). Prove that T is self-adjoint if and only if 
(Tv, v) = (T*v, 0) 


for allv € V. 
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11 


12 


13 


14 


15 


16 


17 


18 
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Define an operator S$: F? > F* by S(w,z) = (—z,w). 


(a) Find a formula for S*. 
(b) Show that S is normal but not self-adjoint. 
(c) Find all eigenvalues of S. 


If F =R, then S is the operator on R? of counterclockwise rotation by 90°. 
An operator B € £(V) is called skew if 
BY = -B. 


Suppose that T € £(V). Prove that T is normal if and only if there exist 
commuting operators A and B such that A is self-adjoint, B is a skew operator, 
andT=A+B. 


Suppose F = R. Define 4 € L(L(V)) by AT = T* forall T € L(V). 


(a) Find all eigenvalues of A. 
(b) Find the minimal polynomial of A. 


Define an inner product on ?,(R) by (p,q) = if pq. Define an operator 
T € £(P,(R)) by 
T (ax? + bx +c) = be. 


(a) Show that with this inner product, the operator T is not self-adjoint. 
(b) The matrix of T with respect to the basis 1, x, x? is 


0 0 0 
0 1 0}. 
0 0 0 


This matrix equals its conjugate transpose, even though T is not self- 
adjoint. Explain why this is not a contradiction. 


Suppose T € £(V) is invertible. Prove that 
(a) Tis self-adjoint <> T™! is self-adjoint; 
(b) Tis normal <> T~! is normal. 


Suppose F = R. 


(a) Show that the set of self-adjoint operators on V is a subspace of £(V). 
(b) What is the dimension of the subspace of £(V) in (a) [in terms of 
dim V]? 


Suppose F = C. Show that the set of self-adjoint operators on V is not a 
subspace of £(V). 


Suppose dim V > 2. Show that the set of normal operators on V is not a 
subspace of £(V). 


19 


20 


21 


22 


23 


24 


25 


26 


27 


28 
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Suppose T € L(V) and ||T*o|] < ||To|| for every v € V. Prove that T is 
normal. 
This exercise fails on infinite-dimensional inner product spaces, leading to 
what are called hyponormal operators, which have a well-developed theory. 


Suppose P € L(V) is such that P*? = P. Prove that the following are 
equivalent. 

(a) P is self-adjoint. 

(b) P is normal. 

(c) There is a subspace U of V such that P = P,;. 


Suppose D: Pg(R) > Pg(R) is the differentiation operator defined by 
Dp = p’. Prove that there does not exist an inner product on ?g(R) that 
makes D a normal operator. 


Give an example of an operator T € £(R?) such that T is normal but not 
self-adjoint. 


Suppose T is a normal operator on V. Suppose also that v, w € V satisfy the 
equations 
lol] = lw] = 2, To=30, Tw = 4u. 


Show that |[T(v + w)|| = 10. 
Suppose T € £(V) and 
Ap + aye 4 Ger? + +a 2 a 
is the minimal polynomial of T. Prove that the minimal polynomial of T* is 


Mp £0 24 Ogee ag oO ee 


This exercise shows that the minimal polynomial of T* equals the minimal 
polynomial of T if F=R. 


Suppose T € L(V). Prove that T is diagonalizable if and only if T* is 
diagonalizable. 


Fix u,x € V. Define T € L(V) by Tv = (v, u)x for every v € V. 


(a) Prove that if V is a real vector space, then T is self-adjoint if and only if 
the list u, x is linearly dependent. 
(b) Prove that T is normal if and only if the list u, x is linearly dependent. 


Suppose T € £(V) is normal. Prove that 
nullT* = nullT and range T* = range T 
for every positive integer k. 


Suppose T € £(V) is normal. Prove that if A © F, then the minimal 
polynomial of T is not a polynomial multiple of (x — A)”. 
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Prove or give a counterexample: If T € £(V) and there is an orthonormal 
basis ¢),...,e,, of V such that ||Te;|| = ||T*e;|| for each k = 1,...,n, then T is 
normal. 


Suppose that T € L(F*) is normal and T(1,1,1) = (2,2,2). Suppose 
(Z1,Z2,Z3) € nullT. Prove that z; + z. + zz; = 0. 


Fix a positive integer n. In the inner product space of continuous real-valued 
functions on [—71, 7r] with inner product (f,g) = f”_ fg, let 


V = span(1, cos x, cos 2x, ..., cos nx, sin x, sin 2x, ..., sin nx). 


(a) Define D € £(V) by Df = f’. Show that D* = —D. Conclude that D 
is normal but not self-adjoint. 
(b) Define T € L(V) by Tf = f”. Show that T is self-adjoint. 


Suppose T: V — Wis a linear map. Show that under the standard identifica- 
tion of V with V’ (see 6.58) and the corresponding identification of W with 
W’, the adjoint map T*: W — V corresponds to the dual map T’: W’ > V’. 
More precisely, show that 


T'(@w) = Tw 


for all w € W, where g,, and @ »,, are defined as in 6.58. 
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7B Spectral Theorem 


Recall that a diagonal matrix is a square matrix that is 0 everywhere except 
possibly on the diagonal. Recall that an operator on V is called diagonalizable if 
the operator has a diagonal matrix with respect to some basis of V. Recall also 
that this happens if and only if there is a basis of V consisting of eigenvectors of 
the operator (see 5.55). 

The nicest operators on V are those for which there is an orthonormal basis 
of V with respect to which the operator has a diagonal matrix. These are precisely 
the operators T € £(V) such that there is an orthonormal basis of V consisting 
of eigenvectors of T. Our goal in this section is to prove the spectral theorem, 
which characterizes these operators as the self-adjoint operators when F = R and 
as the normal operators when F = C. 

The spectral theorem is probably the most useful tool in the study of operators 
on inner product spaces. Its extension to certain infinite-dimensional inner product 
spaces (see, for example, Section 10D of the author’s book Measure, Integration 
& Real Analysis) plays a key role in functional analysis. 

Because the conclusion of the spectral theorem depends on F, we will break 
the spectral theorem into two pieces, called the real spectral theorem and the 
complex spectral theorem. 


Real Spectral Theorem 


To prove the real spectral theorem, we will need two preliminary results. These 
preliminary results hold on both real and complex inner product spaces, but they 
are not needed for the proof of the complex spectral theorem. 

You could guess that the nextresultis — 7p;, completing-the-square technique 
true and even discover its proof by think- ay he used to derive the quadratic 
ing about quadratic polynomials with — fyrmula. 
real coefficients. Specifically, suppose 
b,c € Rand b? < 4c. Let x be a real number. Then 


b\2 b 
Prbxrtc=(x+ >) +(c- >) >0 


In particular, x? + bx + c is an invertible real number (a convoluted way of saying 
that it is not 0). Replacing the real number x with a self-adjoint operator (recall the 
analogy between real numbers and self-adjoint operators) leads to the next result. 


7.26 invertible quadratic expressions 


Suppose T € L(V) is self-adjoint and b,c € R are such that b* < 4c. Then 


T? 4+ bT +cl 


is an invertible operator. 


244 Chapter 7 Operators on Inner Product Spaces 


Proof Let v bea nonzero vector in V. Then 
((T? + bT + cl)v,v) = (T?v, v) + (Tv, 0) + ctv, 0) 
= (Tv, Tv) + b(Tv, v) + cllol? 
> ||Tol? — [bi Toll loll + cll? 


Ib loll \2 b 
= (ire - ) +(c- >)? 
2 4 


> 0, 


where the third line above holds by the Cauchy—Schwarz inequality (6.14). The 
last inequality implies that (T? + bT + cl)v # 0. Thus T? + bT + cl is injective, 
which implies that it is invertible (see 3.65). 


The next result will be a key tool in our proof of the real spectral theorem. 


7.27. minimal polynomial of self-adjoint operator 


Suppose T € £(V) is self-adjoint. Then the minimal polynomial of T equals 
(z — A,)-+-(z — A,,) for some A4,...,A,, E R. 


Proof First suppose F = C. The zeros of the minimal polynomial of T are the 
eigenvalues of T [by 5.27(a)]. All eigenvalues of T are real (by 7.12). Thus the 
second version of the fundamental theorem of algebra (see 6.69) tells us that the 
minimal polynomial of T has the desired form. 

Now suppose F = R. By the factorization of a polynomial over R (see 4.16) 
there exist A,,...,A,, € R and by, ..., by, C15 +05 Cn © R with b? < 4c; for each k 
such that the minimal polynomial of T equals 


7.28 (Z = Ay) (Z = Ayy) (2? + yz + 01) ++(2? + DNZ + CN); 


here either m or N might equal 0, meaning that there are no terms of the corre- 
sponding form. Now 


(T=A, DAT HAD? +b HEA + he eel) = 0. 


If N > 0, then we could multiply both sides of the equation above on the right by 
the inverse of T? + byT + cyl (which is an invertible operator by 7.26) to obtain a 
polynomial expression of T that equals 0. The corresponding polynomial would 
have degree two less than the degree of 7.28, violating the minimality of the 
degree of the polynomial with this property. Thus we must have N = 0, which 
means that the minimal polynomial in 7.28 has the form (z — A,)---(z — A,,), aS 
desired. 


The result above along with 5.27(a) implies that every self-adjoint operator 
has an eigenvalue. In fact, as we will see in the next result, self-adjoint operators 
have enough eigenvectors to form a basis. 
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The next result, which gives a complete description of the self-adjoint operators 
on a real inner product space, is one of the major theorems in linear algebra. 


7.29 real spectral theorem 


Suppose F = R and T € £(V). Then the following are equivalent. 


(a) T is self-adjoint. 


(b) T has a diagonal matrix with respect to some orthonormal basis of V. 


(c) V has an orthonormal basis consisting of eigenvectors of T. 


Proof First suppose (a) holds, so T is self-adjoint. Our results on minimal poly- 
nomials, specifically 6.37 and 7.27, imply that T has an upper-triangular matrix 
with respect to some orthonormal basis of V. With respect to this orthonormal 
basis, the matrix of T* is the transpose of the matrix of T. However, T* = T. 
Thus the transpose of the matrix of T equals the matrix of T. Because the matrix 
of T is upper-triangular, this means that all entries of the matrix above and below 
the diagonal are 0. Hence the matrix of T is a diagonal matrix with respect to the 
orthonormal basis. Thus (a) implies (b). 

Conversely, now suppose (b) holds, so T has a diagonal matrix with respect to 
some orthonormal basis of V. That diagonal matrix equals its transpose. Thus 
with respect to that basis, the matrix of T* equals the matrix of T. Hence T* = T, 
proving that (b) implies (a). 

The equivalence of (b) and (c) follows from the definitions [or see the proof 
that (a) and (b) are equivalent in 5.55]. 


7.30 example: an orthonormal basis of eigenvectors for an operator 


Consider the operator T on R* whose matrix (with respect to the standard 
basis) is 14-13. 8 
-13 14 8 |. 
8 8 —-7 


This matrix with real entries equals its transpose; thus T is self-adjoint. As you 
can verify, 
(1,-1,0) (1,1,1) (,1,-2) 
v2 vB 
is an orthonormal basis of R° consisting of eigenvectors of T. With respect to 
this basis, the matrix of T is the diagonal matrix 


27 0 O 
0 9 O J}. 
0 0O -15 


See Exercise 17 for a version of the real spectral theorem that applies simulta- 
neously to more than one operator. 
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Complex Spectral Theorem 


The next result gives a complete description of the normal operators on a complex 
inner product space. 


7.31 complex spectral theorem 


Suppose F = C and T € L(V). Then the following are equivalent. 
(a) Tis normal. 


(b) T has a diagonal matrix with respect to some orthonormal basis of V. 


(c) V has an orthonormal basis consisting of eigenvectors of T. 


Proof First suppose (a) holds, so T is normal. By Schur’s theorem (6.38), there is 
an orthonormal basis ¢,, ...,¢,, of V with respect to which T has an upper-triangular 
matrix. Thus we can write 


M1 °° Gin 
7.32 MT, (€1,-+5€n)) = ts, : ; 


0 Cane 


We will show that this matrix is actually a diagonal matrix. 
We see from the matrix above that 


Te, (7 = lay 117, 
2 
|T*egIF = lay al? + lay ol? + + lay, nl”. 


Because T is normal, ||Te,|| = |/T*e,|| (see 7.20). Thus the two equations above 
imply that all entries in the first row of the matrix in 7.32, except possibly the first 
entry a, ;, equal 0. 

Now 7.32 implies 

\|Te,|7 = lay oI 


(because a, 5 = 0, as we showed in the paragraph above) and 
IIT*e pI” = Ida gP + Iaa,gl? + + lta, nl 

Because T is normal, ||Te5|| = ||T*e,||. Thus the two equations above imply that 

all entries in the second row of the matrix in 7.32, except possibly the diagonal 

entry a 5, equal 0. 

Continuing in this fashion, we see that all nondiagonal entries in the matrix 
7.32 equal 0. Thus (b) holds, completing the proof that (a) implies (b). 

Now suppose (b) holds, so T has a diagonal matrix with respect to some 
orthonormal basis of V. The matrix of T* (with respect to the same basis) is 
obtained by taking the conjugate transpose of the matrix of T; hence T* also has a 
diagonal matrix. Any two diagonal matrices commute; thus T commutes with T*, 
which means that T is normal. In other words, (a) holds, completing the proof 
that (b) implies (a). 

The equivalence of (b) and (c) follows from the definitions (also see 5.55). 
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See Exercises 13 and 20 for alternative proofs that (a) implies (b) in the 
previous result. 

Exercises 14 and 15 interpret the real spectral theorem and the complex 
spectral theorem by expressing the domain space as an orthogonal direct sum of 
eigenspaces. 

See Exercise 16 for a version of the complex spectral theorem that applies 
simultaneously to more than one operator. 

The main conclusion of the complex spectral theorem is that every normal 
operator on a complex finite-dimensional inner product space is diagonalizable 
by an orthonormal basis, as illustrated by the next example. 


7.33 example: an orthonormal basis of eigenvectors for an operator 


Consider the operator T € £(C7) defined by T(w,z) = (2w — 3z,3w + 2z). 
The matrix of T (with respect to the standard basis) is 


2 -3 

3°92 / 
As we saw in Example 7.19, T is a normal operator. 

As you can verify, 
1), ie 
Flt 1); lh 1) 

is an orthonormal basis of C? consisting of eigenvectors of T, and with respect to 
this basis the matrix of T is the diagonal matrix 


2+ 3i 0 
0 2-31 } 


Exercises 7B 


1 Prove that a normal operator on a complex inner product space is self-adjoint 
if and only if all its eigenvalues are real. 


This exercise strengthens the analogy (for normal operators) between self- 
adjoint operators and real numbers. 


2 Suppose F = C. Suppose T € £(V) is normal and has only one eigenvalue. 
Prove that T is a scalar multiple of the identity operator. 


3 Suppose F = C and T € Z(V) is normal. Prove that the set of eigenvalues 
of T is contained in {0, 1} if and only if there is a subspace U of V such that 
T = Py. 


4 Prove that a normal operator on a complex inner product space is skew 
(meaning it equals the negative of its adjoint) if and only if all its eigenvalues 
are purely imaginary (meaning that they have real part equal to 0). 


248 


10 


11 


12 


13 


14 


15 


16 


Chapter 7 Operators on Inner Product Spaces 


Prove or give a counterexample: If T € £(C°) is a diagonalizable operator, 
then T is normal (with respect to the usual inner product). 


Suppose V is a complex inner product space and T € £(V) is a normal 
operator such that T? = T®. Prove that T is self-adjoint and T? = T. 


Give an example of an operator T on a complex vector space such that 
= T7* batt 27, 


Suppose F = C and T € L(V). Prove that T is normal if and only if every 
eigenvector of T is also an eigenvector of T™. 


Suppose F = C and T € £(V). Prove that T is normal if and only if there 
exists a polynomial p € P(C) such that T* = p(T). 


Suppose V is a complex inner product space. Prove that every normal 
operator on V has a square root. 


An operator S € L(V) is called a square root of T € L(V) if S* =T. We 
will discuss more about square roots of operators in Sections 7C and 8C. 


Prove that every self-adjoint operator on V has a cube root. 
An operator § € L(V) is called a cube root of T € L(V) if S° =T. 


Suppose V is a complex vector space and T € £(V) is normal. Prove that 
if S is an operator on V that commutes with T, then S commutes with T™. 


The result in this exercise is called Fuglede’s theorem. 


Without using the complex spectral theorem, use the version of Schur’s 
theorem that applies to two commuting operators (take € = {T,T*} in 
Exercise 20 in Section 6B) to give a different proof that if F = C and 
T € £(V) is normal, then T has a diagonal matrix with respect to some 
orthonormal basis of V. 


Suppose F = R and T € Z(V). Prove that T is self-adjoint if and only 
if all pairs of eigenvectors corresponding to distinct eigenvalues of T are 
orthogonal and V = E(A,,T) ®--- @ E(A,,, T), where Aj, ..., A,,, denote the 
distinct eigenvalues of T. 


Suppose F = C and T € £(V). Prove that T is normal if and only if all pairs 
of eigenvectors corresponding to distinct eigenvalues of T are orthogonal 
and V = E(A,,T) ®--: ® E(A,,,T), where A4,...,A,,, denote the distinct 
eigenvalues of T. 


Suppose F = C and € C L(V). Prove that there is an orthonormal basis 
of V with respect to which every element of € has a diagonal matrix if and 
only if S and T are commuting normal operators for all S,T € €. 
This exercise extends the complex spectral theorem to the context of a 
collection of commuting normal operators. 
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Suppose F = R and € C Z£(V). Prove that there is an orthonormal basis 
of V with respect to which every element of € has a diagonal matrix if and 
only if S and T are commuting self-adjoint operators for all S,T € €. 


This exercise extends the real spectral theorem to the context of a collection 
of commuting self-adjoint operators. 


Give an example of a real inner product space V, an operator T € £(V), 
and real numbers b,c with b? < 4c such that 


T2 4+ bT +cl 


is not invertible. 


This exercise shows that the hypothesis that T is self-adjoint cannot be 
deleted in 7.26, even for real vector spaces. 


Suppose T € £(V) is self-adjoint and U is a subspace of V that is invariant 
under T. 

(a) Prove that U+ is invariant under T. 

(b) Prove that T|,; € £(U) is self-adjoint. 

(c) Prove that T|,1 € 2(U“) is self-adjoint. 


Suppose T € L(V) is normal and U is a subspace of V that is invariant 
under T. 

(a) Prove that U+ is invariant under T. 

(b) Prove that U is invariant under T™*. 

(c) Prove that (T|,;)* = (T*)ly. 

(d) Prove that T|,, € £(U) and T|,,. € £(U*) are normal operators. 


This exercise can be used to give yet another proof of the complex spectral 
theorem (use induction on dim V and the result that T has an eigenvector). 


Suppose that T is a self-adjoint operator on a finite-dimensional inner product 
space and that 2 and 3 are the only eigenvalues of T. Prove that 


T? —5T +61 =0. 


Give an example of an operator T € £(C°?) such that 2 and 3 are the only 
eigenvalues of T and T* — 5T + 61 # 0. 


Suppose T € L(V) is self-adjoint, A € F, and e > 0. Suppose there exists 
v © V such that ||v|| = 1 and 


|Tv — Avl| < e€. 


Prove that T has an eigenvalue A’ such that |A — A’| < e. 
This exercise shows that for a self-adjoint operator, a number that is close 
to satisfying an equation that would make it an eigenvalue is close to an 
eigenvalue. 
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Suppose U is a finite-dimensional vector space and T € £(U). 


(a) Suppose F = R. Prove that T is diagonalizable if and only if there is a 
basis of U such that the matrix of T with respect to this basis equals its 
transpose. 

(b) Suppose F = C. Prove that T is diagonalizable if and only if there is a 
basis of U such that the matrix of T with respect to this basis commutes 
with its conjugate transpose. 


This exercise adds another equivalence to the list of conditions equivalent 
to diagonalizability in 5.55. 


Suppose that T € £(V) and there is an orthonormal basis e,,...,e,, of V 
consisting of eigenvectors of T, with corresponding eigenvalues Aj, ..., A, 
Show that if k © {1,...,1}, then the pseudoinverse T’ satisfies the equation 
1 , 
=e, if A, #0, 
Tre, = ‘ 
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7C_ Positive Operators 


7.34 definition: positive operator 


An operator T € £(V) is called positive if T is self-adjoint and 


(Tv, v) > 0 


for allv € V. 


If V is a complex vector space, then the requirement that T be self-adjoint can 
be dropped from the definition above (by 7.14). 


7.35 example: positive operators 


(a) Let T € L(F*) be the operator whose matrix (using the standard basis) is 
( 2, 51). Then Tis self-adjoint and (T(w, z), (w, z)) = 2|w|?—-2 Re (wz) +|zF* 
= |w —z/* + |wi* = 0 for all (w,z) € F*. Thus T is a positive operator. 

(b) If Wis asubspace of V, then the orthogonal projection P,, is a positive operator, 
as you should verify. 

(c) IfT € L(V) is self-adjoint and b,c € R are such that b? < 4c, then T?+bT+cl 
is a positive operator, as shown by the proof of 7.26. 


7.36 definition: square root 
An operator R is called a square root of an operator T if R? = T. 


7.37 example: square root of an operator 


If T € L(F°) is defined by T(Z1,29,23) = (Z3,0,0), then the operator 
Re L(F°) defined by R(Z,, 2,23) = (Z,Z3,0) is a square root of T because 
R* =T, as you can verify. 


The characterizations of the positive Because positive operators correspond 
operators in the next result correspond nonnegative numbers, better termi- 
to characterizations of the nonnegative nology would use the term nonnegative 
numbers among C. Specifically, a num- operators. However, operator theorists 
ber z € C is nonnegative if and only consistently call these positive opera- 
if it has a nonnegative square root, cor- tors, so we follow that custom. Some 
responding to condition (d). Also, zis | mathematicians use the term positive 
nonnegative if and only if it has a real semidefinite operator, which means 
square root, corresponding to condition —_ the same as positive operator. 

(e). Finally, z is nonnegative if and only 
if there exists w € C such that z = ww, corresponding to condition (f). See 
Exercise 20 for another condition that is equivalent to being a positive operator. 
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7.38 characterization of positive operators 


Let T € £(V). Then the following are equivalent. 
(a) Tis a positive operator. 


(b) Tis self-adjoint and all eigenvalues of T are nonnegative. 


(c) With respect to some orthonormal basis of V, the matrix of T is a diagonal 
matrix with only nonnegative numbers on the diagonal. 


(d) T has a positive square root. 
(e) Thas a self-adjoint square root. 


(f) T =R*R for some R € L(V). 


Proof We will prove that (a) = (b) = (c) = (d) = (e) = (f) = (a). 

First suppose (a) holds, so that T is positive, which implies that T is self-adjoint 
(by definition of positive operator). To prove the other condition in (b), suppose 
A is an eigenvalue of T. Let v be an eigenvector of T corresponding to A. Then 


0 < (Tv, v) = (Av, v) = Atv, v). 


Thus A is a nonnegative number. Hence (b) holds, showing that (a) implies (b). 

Now suppose (b) holds, so that T is self-adjoint and all eigenvalues of T are 
nonnegative. By the spectral theorem (7.29 and 7.31), there is an orthonormal 
basis e,, ...,e,, of V consisting of eigenvectors of T. Let A,,...,A,, be the eigenval- 
ues of T corresponding to é,,...,e,,; thus each A, is a nonnegative number. The 
matrix of T with respect to e,,...,e,, is the diagonal matrix with A,,..., A,, on the 
diagonal, which shows that (b) implies (c). 

Now suppose (c) holds. Suppose e),...,e,, is an orthonormal basis of V such 
that the matrix of T with respect to this basis is a diagonal matrix with nonnegative 
numbers A,,...,A,, on the diagonal. The linear map lemma (3.4) implies that 


there exists R € £(V) such that 


Re, = J Nil 

for each k = 1,...,n. As you should verify, R is a positive operator. Furthermore, 
Re, = A,e, = Te, for each k, which implies that R? = T. Thus R is a positive 
square root of T. Hence (d) holds, which shows that (c) implies (d). 

Every positive operator is self-adjoint (by definition of positive operator). 
Thus (d) implies (e). 

Now suppose (e) holds, meaning that there exists a self-adjoint operator R on 
V such that T = R*. Then T = R*R (because R* = R). Hence (e) implies (f). 

Finally, suppose (f) holds. Let R € £(V) be such that T = R*R. Then 
T* = (R*R)* = R*(R*)* = R*R =T. Hence T is self-adjoint. To complete the 
proof that (a) holds, note that 


(Tv, v) = (R*Rv,v) = (Rv, Rv) > 0 
for every v € V. Thus T is positive, showing that (f) implies (a). 
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Every nonnegative number has a unique nonnegative square root. The next 
result shows that positive operators enjoy a similar property. 


7.39 each positive operator has only one positive square root 


Every positive operator on V has a unique positive square root. 


Proof Suppose T € £(V) is positive. 4 positive operator can have infinitely 
Suppose v € V is an eigenvector of T. many square roots (although only one 
Hence there exists a real number A >0 of themcan be positive). For example, 
such that Tv = Av. the identity operator on V has infinitely 

Let R be a positive square root of T. | many square roots if dim V > 1. 

We will prove that Rv = VAv. This will 

imply that the behavior of R on the eigenvectors of T is uniquely determined. 
Because there is a basis of V consisting of eigenvectors of T (by the spectral 
theorem), this will imply that R is uniquely determined. 

To prove that Rv = V Av, note that the spectral theorem asserts that there is an 
orthonormal basis e;, ...,e,, of V consisting of eigenvectors of R. Because R is a 
positive operator, all its eigenvalues are nonnegative. Thus there exist nonnegative 
numbers Aj,...,A,, such that Re, = Ane for each k = 1,...,n. 

Because é}, ...,€,, is a basis of V, we can write 


v= aye, ++ +4, e,, 


for some numbers a,, ...,4,, € F. Thus 
Ro = ay y/Aqey to tay VAy en: 


Av = To = R70 = ay Aye, + + A, Agen: 


Hence 


The equation above implies that 
Ay Aey + +a, Al, = AA ey to +A, A,Cy- 
Thus a, (A — A;) = 0 for each k = 1,...,1. Hence 
{ki =A} 
Thus 
Rvo= x av Ae, = VA, 


{ki A= A} 
as desired. 


The notation defined below makes sense thanks to the result above. 


For T a positive operator, VT denotes the unique positive square root of T. 
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7.41. example: square root of positive operators 


Define operators S, T on R? (with the usual Euclidean inner product) by 
S(x,y) = (x,2y) and T(x,y)=(x+y,x+y). 
Then with respect to the standard basis of R? we have 


7.42 mes) = ( 4 - and meen =( | i) 


Each of these matrices equals its transpose; thus S and T are self-adjoint. 
If (x,y) € R’, then 
(S(x,y), (x, y)) = x7 + 2y? > 0 


and 
(T(x,y), (%y)) = 22 + Qey + y? = 44+ y)? 20. 


Thus S and T are positive operators. 
The standard basis of R? is an orthonormal basis consisting of eigenvectors of 


S. Note that 
i 1 di 
( ye ae) 


is an orthonormal basis of eigenvectors of T, with eigenvalue 2 for the first 


eigenvectors, with eigenvalues V2 and 0. 


eigenvector and eigenvalue 0 for the second eigenvector. Thus VT has the same 
You can verify that 
1 0 


mvs) =( 4 a) and aeeF) =[ ia 
v2 


with respect to the standard basis by showing that the squares of the matrices 
above are the matrices in 7.42 and that each matrix above is the matrix of a positive 
operator. 


sil? Slr 


The statement of the next result does not involve a square root, but the clean 
proof makes nice use of the square root of a positive operator. 


7.43 T positive and (Tv,v) =0 = Tv=0 


Suppose T is a positive operator on V and v € V is such that (Tv,v) = 0. 
Then Tv = 0. 


Proof We have 
0 = (Tv,v) = (VTVT», 0) — (VT», VTv) = || vol. 


Hence VTv = 0. Thus Tv = vT( VTv) = 0, as desired. 
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Exercises 7C 
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11 


12 


Suppose T € £(V). Prove that if both T and —T are positive operators, then 
T=0. 


Suppose T € L(F*) is the operator whose matrix (with respect to the 
standard basis) is 


2 -1 0 0 
-1 2 -1 O 
0 -1 2 -1 
0 O -1 2 


Show that T is an invertible positive operator. 


Suppose 7 is a positive integer and T € L(F”) is the operator whose matrix 
(with respect to the standard basis) consists of all 1’s. Show that T is a 
positive operator. 


Suppose n is an integer with n > 1. Show that there exists an n-by-n matrix 
A such that all of the entries of A are positive numbers and A = A%, but the 
operator on F” whose matrix (with respect to the standard basis) equals A is 
not a positive operator. 


Suppose T € £(V) is self-adjoint. Prove that T is a positive operator if and 
only if for every orthonormal basis e,,...,e,, of V, all entries on the diagonal 
of M(T, (e1,...,€,)) are nonnegative numbers. 


Prove that the sum of two positive operators on V is a positive operator. 


Suppose S € £(V) is an invertible positive operator and T € L(V) isa 
positive operator. Prove that S + T is invertible. 


Suppose T € L(V). Prove that T is a positive operator if and only if the 
pseudoinverse Tt isa positive operator. 


Suppose T € L(V) is a positive operator and S € L(W,V). Prove that 
S*TS is a positive operator on W. 


Suppose T is a positive operator on V. Suppose v, w € V are such that 
Tv=w and Tw=v. 
Prove that v = w. 


Suppose T is a positive operator on V and U is a subspace of V invariant 
under T. Prove that T|,, € £(U) is a positive operator on U. 


Suppose T € L(V) is a positive operator. Prove that T* is a positive operator 
for every positive integer k. 
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Suppose T € £(V) is self-adjoint anda € R. 

(a) Prove that T — al is a positive operator if and only if a is less than or 
equal to every eigenvalue of T. 

(b) Prove that aI — T is a positive operator if and only if a is greater than or 
equal to every eigenvalue of T. 


Suppose T is a positive operator on V and 7,...,v,, € V. Prove that 


m m 


y >: (Tv,, vj) > 0. 


j=lk=1 
Suppose T € £(V) is self-adjoint. Prove that there exist positive operators 
A,B € £(V) such that 

T=A-B and VT*T=A+B and AB=BA=0. 
Suppose T is a positive operator on V. Prove that 


null VT = nullT and range VT = rangeT. 


Suppose that T € L(V) is a positive operator. Prove that there exists a 
polynomial p with real coefficients such that VT = p(T). 


Suppose S and T are positive operators on V. Prove that ST is a positive 
operator if and only if S and T commute. 


Show that the identity operator on F? has infinitely many self-adjoint square 
roots. 


Suppose T € £(V) and e,,...,e,, is an orthonormal basis of V. Prove that T 
is a positive operator if and only if there exist v,,...,v,, € V such that 


(Tex, €;) = (Ux, U;) 
for all j,k = 1,...,n. 


The numbers {(Te;,, e)}, kat.n ME the entries in the matrix of T with 
respect to the orthonormal basis é,, ...,€y. 


Suppose n is a positive integer. The n-by-n Hilbert matrix is the n-by-n 
matrix whose entry in row j, column k is et Suppose T € £(V) is an 
operator whose matrix with respect to some orthonormal basis of V is the 
n-by-n Hilbert matrix. Prove that T is a positive invertible operator. 


Example: The 4-by-4 Hilbert matrix is 


1 1 1 
133 7% 
dee th 
2 3 4 5 
ae cae | 
3 4 5 6 
ae a Oe 
4 5 6 7 
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Suppose T € L(V) is a positive operator and u € V is such that ||u|| = 1 
and ||Tu|| > ||Tv|| for all v € V with ||v|| = 1. Show that u is an eigenvector 
of T corresponding to the largest eigenvalue of T. 


For T € £(V) and u,v € V, define (u,v) by (u,v), = (Tu, v). 


(a) Suppose T € £(V). Prove that (-, -) is an inner product on V if and 
only if T is an invertible positive operator (with respect to the original 
inner product (-, -)). 

(b) Prove that every inner product on V is of the form (., -) for some positive 
invertible operator T € £(V). 


Suppose S and T are positive operators on V. Prove that 
null(S + T) = nullSQ null T. 


Let T be the second derivative operator in Exercise 31(b) in Section 7A. 
Show that —T is a positive operator. 
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Isometries 


Linear maps that preserve norms are sufficiently important to deserve a name. 


7.44 definition: isometry 


A linear map S € £(V, W) is called an isometry if 
Sol = |loll 


for every v € V. In other words, a linear map is an isometry if it preserves 
norms. 


If S € L(V,W) is an isometry and 


The Greek word isos means equal; the 
v & Vis such that Sv = 0, then 


Greek word metron means measure. 
lol] = Sul] = |]O|| = 0, Thus isometry literally means equal 


measure. 
which implies that v = 0. Thus every 


isometry is injective. 


7.45 example: orthonormal basis maps to orthonormal list =» isometry 


Suppose ée;, ..., é,, is an orthonormal basis of V and gy, ...,@,, is an orthonormal 
list in W. Let S € L(V,W) be the linear map such that Se, = 9, for each 
k =1,...,n. To show that S is an isometry, suppose v € V. Then 


7.46 VU = (U, 1 )ey Ht + (0, Cy En 
and 

2 2 2 
7.47 Ill" = |(o, eq)" + ++ + (C0, en), 


where we have used 6.30(b). Applying S to both sides of 7.46 gives 
Sv = (v, €1)Sey + + (0,0, )S€, = (U0, €1) By Ho + (TCE 

Thus 

7.48 Sul = |v, e,) (7° + + [Ko 2,12 


Comparing 7.47 and 7.48 shows that ||v|| = ||Sv||. Thus S is an isometry. 


The next result gives conditions equivalent to being an isometry. The equiv- 
alence of (a) and (c) shows that a linear map is an isometry if and only if it 
preserves inner products. The equivalence of (a) and (d) shows that a linear map 
is an isometry if and only if it maps some orthonormal basis to an orthonormal list. 
Thus the isometries given by Example 7.45 include all isometries. Furthermore, 
a linear map is an isometry if and only if it maps every orthonormal basis to an 
orthonormal list [because whether or not (a) holds does not depend on the basis 


Cisnves Oyo 
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The equivalence of (a) and (e) in the next result shows that a linear map is an 
isometry if and only if the columns of its matrix (with respect to any orthonormal 
bases) form an orthonormal list. Here we are identifying the columns of an m-by-n 
matrix with elements of F” and then using the Euclidean inner product on F”. 


7.49 characterization of isometries 


Suppose S € £(V,W). Suppose e,...,e,, is an orthonormal basis of V and 
fis-+» fin is an orthonormal basis of W. Then the following are equivalent. 


(a) Sis an isometry. 
(b) S*S =I. 
(c) (Su, Sv) = (u,v) for all u,v € V. 


(d) Se,,..., Se, is an orthonormal list in W. 


(e) The columns of (5S, (e1,..-,€,), (f4,---» fn)) form an orthonormal list 
in F” with respect to the Euclidean inner product. 


Proof First suppose (a) holds, so S is an isometry. If v € V then 
((I — S*S)v,v) = (v, 0) — (S*Sv, v) = |loll? — (Sv, Sv) = |ol|? — ||So|/? = 0. 


Hence the self-adjoint operator I — S*S equals 0 (by 7.16). Thus S*S = I, proving 
that (a) implies (b). 
Now suppose (b) holds, so S*S = I. If u,v € V then 


(Su, Sv) = (S*Su,v) = (Iu, v) = (u,v), 


proving that (b) implies (c). 
Now suppose that (c) holds, so (Su,Sv) = (u,v) for all u,v € V. Thus if 
ike {1,..., 0}, then 
(Se;, Se.) = CATO 
Hence Se,, ..., Se,, is an orthonormal list in W, proving that (c) implies (d). 
Now suppose that (d) holds, so Se,,..., Se,, is an orthonormal list in W. Let 
A = M(S, (C4, 05 €n)s s+ fin))- fk © {1,..., 2}, then 


m. Mm m 1 ifk=r, 
7.50 PS ere ene — (, Arik 2, Aurfi) = (Sex, Se,) = ( fk 4 . 


The left side of 7.50 is the inner product in F” of columns k and r of A. Thus the 
columns of A form an orthonormal list in F’”’, proving that (d) implies (e). 

Now suppose (e) holds, so the columns of the matrix A defined in the paragraph 
above form an orthonormal list in F”. Then 7.50 shows that Se,,...,Se,, is an 
orthonormal list in W. Thus Example 7.45, with Se, ..., Se,, playing the role of 
813 +++58,, Shows that S is an isometry, proving that (e) implies (a). 


See Exercises | and 11 for additional conditions that are equivalent to being 
an isometry. 
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Unitary Operators 


In this subsection, we confine our attention to linear maps from a vector space to 
itself. In other words, we will be working with operators. 


7.51 definition: unitary operator 


An operator S € L(V) is called unitary if S is an invertible isometry. 


As previously noted, every isometry Although the words “unitary” and 
is injective. Every injective operator on “isometry” mean the same thing for 
a finite-dimensional vector space is in- _ gperators on finite-dimensional inner 
vertible (see 3.65). A standing assump- product spaces, remember that a uni- 
tion for this chapter is that V is a finite- tary operator maps a vector space to 
dimensional inner product space. Thus __ itself, while an isometry maps a vector 
we could delete the word “invertible” space to another (possibly different) 
from the definition above without chang- vector space. 
ing the meaning. The unnecessary word 
“invertible” has been retained in the definition above for consistency with the 
definition readers may encounter when learning about inner product spaces that 
are not necessarily finite-dimensional. 


7.52 example: rotation of R? 


Suppose @ € R and S is the operator on F* whose matrix with respect to the 


standard basis of F? is 
cos@ —sind 
sind cos@ }' 


The two columns of this matrix form an orthonormal list in F*; hence S is an 
isometry [by the equivalence of (a) and (e) in 7.49]. Thus S is a unitary operator. 

If F = R, then S is the operator of counterclockwise rotation by @ radians 
around the origin of R*. This observation gives us another way to think about why 
S is an isometry, because each rotation around the origin of R? preserves norms. 


The next result (7.53) lists several conditions that are equivalent to being a 
unitary operator. All the conditions equivalent to being an isometry in 7.49 should 
be added to this list. The extra conditions in 7.53 arise because of limiting the 
context to linear maps from a vector space to itself. For example, 7.49 shows that 
alinear map S € L(V, W) is an isometry if and only if S*S = I, while 7.53 shows 
that an operator S € L(V) is a unitary operator if and only if S*S = SS* =I. 

Another difference is that 7.49(d) mentions an orthonormal list, while 7.53(d) 
mentions an orthonormal basis. Also, 7.49(e) mentions the columns of M(T), 
while 7.53(e) mentions the rows of 1(T). Furthermore, M(T) in 7.49(e) is with 
respect to an orthonormal basis of V and an orthonormal basis of W, while M(T) 
in 7.53(e) is with respect to a single basis of V doing double duty. 
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7.53 characterization of unitary operators 


Suppose S € £(V). Suppose ey, ..., e,, is an orthonormal basis of V. Then the 
following are equivalent. 


(a) Sis a unitary operator. 


(bias 55° = 


(c) Sis invertible and S~! = S*. 


(d) Se,,...,Se,, is an orthonormal basis of V. 


(e) The rows of M(S, (e;,...,e,,)) form an orthonormal basis of F” with 
respect to the Euclidean inner product. 


(f) S* is a unitary operator. 


Proof First suppose (a) holds, so S is a unitary operator. Hence 
Sos! 


by the equivalence of (a) and (b) in 7.49. Multiply both sides of this equation by 
S-! on the right, getting S* = S-!. Thus SS* = SS"! = I, as desired, proving 
that (a) implies (b). 

The definitions of invertible and inverse show that (b) implies (c). 

Now suppose (c) holds, so S is invertible and S~! = S*. Thus S*S = I. Hence 
Se, ..., Se, is an orthonormal list in V, by the equivalence of (b) and (d) in 7.49. 
The length of this list equals dim V. Thus Sey, ..., Se,, is an orthonormal basis of V, 
proving that (c) implies (d). 

Now suppose (d) holds, so Se,,...,Se,, is an orthonormal basis of V. The 
equivalence of (a) and (d) in 7.49 shows that S is a unitary operator. Thus 


(S*)*5* = $S* =I, 


where the last equation holds because we have already shown that (a) implies (b) in 
this result. The equation above and the equivalence of (a) and (b) in 7.49 show that 
S* is an isometry. Thus the columns of M(S*, (e1, ..., €,,)) form an orthonormal ba- 
sis of F” [by the equivalence of (a) and (e) of 7.49]. The rows of M(S, (e€1, ..-,€,)) 
are the complex conjugates of the columns of (S*, (e1,...,€,,)). Thus the rows 
of M(S, (e,,...,€,,)) form an orthonormal basis of F”, proving that (d) implies (e). 

Now suppose (e) holds. Thus the columns of W(S*, (e;,...,e,)) form an 
orthonormal basis of F”. The equivalence of (a) and (e) in 7.49 shows that S* is 
an isometry, proving that (e) implies (f). 

Now suppose (f) holds, so S* is a unitary operator. The chain of implications 
we have already proved in this result shows that (a) implies (f). Applying this 
result to S* shows that (S*)* is a unitary operator, proving that (f) implies (a). 

We have shown that (a) = (b) = (c) = (d) = (e) = (f) = (a), completing the 
proof. 
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Recall our analogy between C and £(V). Under this analogy, a complex 
number z corresponds to an operator S € £(V), and Z corresponds to S*. The 
real numbers (z = Z) correspond to the self-adjoint operators (S = S*), and the 
nonnegative numbers correspond to the (badly named) positive operators. 

Another distinguished subset of C is the unit circle, which consists of the 
complex numbers z such that |z| = 1. The condition |z| = 1 is equivalent to the 
condition Zz = 1. Under our analogy, this corresponds to the condition S*S = I, 
which is equivalent to S being a unitary operator. Hence the analogy shows that 
the unit circle in C corresponds to the set of unitary operators. In the next two 
results, this analogy appears in the eigenvalues of unitary operators. Also see 
Exercise 15 for another example of this analogy. 


7.54 eigenvalues of unitary operators have absolute value 1 


Suppose / is an eigenvalue of a unitary operator. Then |A| = 1. 


Proof Suppose S € £(V) is a unitary operator and A is an eigenvalue of S. Let 
v © V be such that v # 0 and Sv = Av. Then 


IAI {loll = Aol = Sell = loll. 
Thus |A| = 1, as desired. 


The next result characterizes unitary operators on finite-dimensional complex 
inner product spaces, using the complex spectral theorem as the main tool. 


7.55 description of unitary operators on complex inner product spaces 


Suppose F = C and S € £(V). Then the following are equivalent. 
(a) S is a unitary operator. 


(b) There is an orthonormal basis of V consisting of eigenvectors of S whose 
corresponding eigenvalues all have absolute value 1. 


Proof Suppose (a) holds, so S is a unitary operator. The equivalence of (a) and 
(b) in 7.53 shows that S is normal. Thus the complex spectral theorem (7.31) 
shows that there is an orthonormal basis e,, ...,e,, of V consisting of eigenvectors 
of S. Every eigenvalue of S has absolute value 1 (by 7.54), completing the proof 
that (a) implies (b). 

Now suppose (b) holds. Let e;,...,e,, be an orthonormal basis of V consisting 
of eigenvectors of S whose corresponding eigenvalues Aj, ..., A,, all have absolute 
value 1. Then Se, ..., Se,, is also an orthonormal basis of V because 


0 ifj#k, 

1 ifj=k 

for all j,k = 1,...,n. Thus the equivalence of (a) and (d) in 7.53 shows that S is 
unitary, proving that (b) implies (a). 


(Se;, Se,) = (Aje;, Apex) = A, AACj> €) = | 
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QR Factorization 


In this subsection, we shift our attention from operators to matrices. This switch 
should give you good practice in identifying an operator with a square matrix 
(after picking a basis of the vector space on which the operator is defined). You 
should also become more comfortable with translating concepts and results back 
and forth between the context of operators and the context of square matrices. 

When starting with n-by-n matrices instead of operators, unless otherwise 
specified assume that the associated operators live on F” (with the Euclidean inner 
product) and that their matrices are computed with respect to the standard basis 
of F”. 

We begin by making the following definition, transferring the notion of a 
unitary operator to a unitary matrix. 


7.56 definition: unitary matrix 


An n-by-n matrix is called unitary if its columns form an orthonormal list 
in F”. 


In the definition above, we could have replaced “orthonormal list in F”” with 
“orthonormal basis of F”” because every orthonormal list of length n in an n- 
dimensional inner product space is an orthonormal basis. If S € £(V) and 
€1,...,€, and f,,..., f,, are orthonormal bases of V, then S is a unitary operator 
if and only if M(S, (e),...,e,), (ft. + f,)) is a unitary matrix, as shown by the 
equivalence of (a) and (e) in 7.49. Also note that we could also have replaced 
“columns” in the definition above with “rows” by using the equivalence between 
conditions (a) and (e) in 7.53. 

The next result, whose proof will be left as an exercise for the reader, gives 
some equivalent conditions for a square matrix to be unitary. In (c), Qu denotes 
the matrix product of Q and 2, identifying elements of F” with n-by-1 matrices 
(sometimes called column vectors). The norm in (c) below is the usual Euclidean 
norm on F” that comes from the Euclidean inner product. In (d), Q* denotes 
the conjugate transpose of the matrix Q, which corresponds to the adjoint of the 
associated operator. 


7.57 characterizations of unitary matrices 


Suppose Q is an n-by-n matrix. Then the following are equivalent. 


(a) Qis aunitary matrix. 


(b) The rows of Q form an orthonormal list in F”. 


(c) ||Qull = |lvll for every v € F” 


(d) Q*Q = QQ = I, the n-by-n matrix with 1’s on the diagonal and 0’s 
elsewhere. 
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The QR factorization stated and proved below is the main tool in the widely 
used QR algorithm (not discussed here) for finding good approximations to 
eigenvalues and eigenvectors of square matrices. In the result below, if the matrix 
A is in F””, then the matrices Q and R are also in F”’”. 


7.58 QR factorization 


Suppose A is a square matrix with linearly independent columns. Then there 


exist unique matrices Q and R such that Q is unitary, R is upper triangular 
with only positive numbers on its diagonal, and 


A=OR. 


Proof Letvj,...,v, denote the columns of A, thought of as elements of F”. Apply 
the Gram—Schmidt procedure (6.32) to the list v,,...,v,,, getting an orthonormal 
basis e1,...,e,, of F” such that 


7.59 span(,,..., Uz) = span(e,,..., €,) 
for each k = 1,...,1. Let R be the n-by-n matrix defined by 


ie = (Uz, ej), 


where KR, ;, denotes the entry in row j, column k of R. Ifj > k, then e; is orthogonal 


to span(e,...,e,) and hence e; is orthogonal to v; (by 7.59). In other words, if 
j > k then (a, e;) = 0. Thus R is an upper-triangular matrix. 

Let Q be the unitary matrix whose columns are ey,...,¢,. Ifk € {1,..., 1}, 
then the k column of QR equals a linear combination of the columns of Q, with 
the coefficients for the linear combination coming from the k column of R—see 


3.51 (a). Hence the k'* column of OR equals 
(Vx, C1 Oy Hott + (Up, CRC Ks 


which equals v;, [by 6.30(a)], the k" column of A. Thus A = QR, as desired. 

The equations defining the Gram—Schmidt procedure (see 6.32) show that 
each v, equals a positive multiple of e, plus a linear combination of e,, ...,e,_4. 
Thus each (v,, e;) is a positive number. Hence all entries on the diagonal of R are 
positive numbers, as desired. 

Finally, to show that Q and R are unique, suppose we also have A = Q R, where 
Q is unitary and Ris upper triangular with only positive numbers on its diagonal. 
Let qj, ...,9,, denote the columns of Q. Thinking of matrix multiplication as above, 
we see that each v, is a linear combination of q1, ..., q,, with the coefficients coming 
from the k" column of R. This implies that span(7, ...,0,) = span(qy,...,9;) and 
(0, 9;) > 0. The uniqueness of the orthonormal lists satisfying these conditions 
(see Exercise 10 in Section 6B) now shows that q, = e, foreachk = 1,...,n. Hence 
Q = Q, which then implies that R=R, completing the proof of uniqueness. 
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The proof of the QR factorization shows that the columns of the unitary matrix 
can be computed by applying the Gram—Schmidt procedure to the columns of the 
matrix to be factored. The next example illustrates the computation of the QR 
factorization based on the proof that we just completed. 


7.60 example: QR factorization of a 3-by-3 matrix 


To find the QR factorization of the matrix 


12 1 
A=] 0 1 -4 |, 
03 2 


follow the proof of 7.58. Thus set v,, v2, v3 equal to the columns of A: 
O71 _ (1, 0,0), 02 _ (2, 1, 3), 03, = (1, —4, 2). 
Apply the Gram—Schmidt procedure to v1, v2, v3, producing the orthonormal list 
— =(9 a ee ee 
ey = (1, 0,0), C5 i (0, V10” 7a) e3 = (0, V10° aa 


Still following the proof of 7.58, let Q be the unitary matrix whose columns are 


€1,€p,€3: 
1 0 0 

Of 2s 228 

Q= V0 V10 

6.28 

vio —- V10 


As in the proof of 7.58, let R be the 3-by-3 matrix whose entry in row j, column k 
is (Uz, es which gives 


1 2 1 

v10 

R=| 0 vio 
7¥10 

0.0 °F 


Note that R is indeed an upper-triangular matrix with only positive numbers on 
the diagonal, as required by the QR factorization. 

Now matrix multiplication can verify that A = QR is the desired factorization 
of A: 


i °4 i > 4 
12 1 
1 3 
or=|° Fe -7m || o vo 2 -(01 Jaa 
fe 770 03 2 
Vio Vio a -s 


Thus A = QR, as expected. 


The QR factorization will be the major tool used in the proof of the Cholesky 
factorization (7.63) in the next subsection. For another nice application of the QR 
factorization, see the proof of Hadamard’s inequality (9.66). 
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If a QR factorization is available, then it can be used to solve a corresponding 
system of linear equations without using Gaussian elimination. Specifically, 
suppose A is an n-by-n square matrix with linearly independent columns. Suppose 
that b € F” and we want to solve the equation Ax = b for x = (x},...,X,) © F” 
(as usual, we are identifying elements of F” with n-by-1 column vectors). 

Suppose A = QR, where Q is unitary and R is upper triangular with only 
positive numbers on its diagonal (Q and R are computable from A using just the 
Gram-—Schmidt procedure, as shown in the proof of 7.58). The equation Ax = b is 
equivalent to the equation QRx = b. Multiplying both sides of this last equation 
by Q* on the left and using 7.57(d) gives the equation 


Rx = Q*b. 


The matrix Q* is the conjugate transpose of the matrix Q. Thus computing 
Q*b is straightforward. Because R is an upper-triangular matrix with positive 
numbers on its diagonal, the system of linear equations represented by the equation 
above can quickly be solved by first solving for x,,, then for x,,_,, and so on. 


Cholesky Factorization 


We begin this subsection with a characterization of positive invertible operators 
in terms of inner products. 


7.61 positive invertible operator 


A self-adjoint operator T € £(V) is a positive invertible operator if and only 
if (Tv, v) > 0 for every nonzero v € V. 


Proof First suppose T is a positive invertible operator. If v € V and v + 0, then 
because T is invertible we have Tv + 0. This implies that (Tv, v) # 0 (by 7.43). 
Hence (Tv, v) > 0. 

To prove the implication in the other direction, suppose now that (Tv, v) > 0 
for every nonzero v € V. Thus Tv # 0 for every nonzero v € V. Hence T is 
injective. Thus T is invertible, as desired. 


The next definition transfers the result above to the language of matrices. Here 
we are using the usual Euclidean inner product on F” and identifying elements of 
F” with n-by-1 column vectors. 


7.62 definition: positive definite 
A matrix B € F”" is called positive definite if B* = B and 


(Bx, x) > 0 


for every nonzero x € F”. 
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A matrix is upper triangular if and only if its conjugate transpose is lower 
triangular (meaning that all entries above the diagonal are 0). The factorization 
below, which has important consequences in computational linear algebra, writes 
a positive definite matrix as the product of a lower triangular matrix and its 
conjugate transpose. 

Our next result is solely about matrices, although the proof makes use of the 
identification of results about operators with results about square matrices. In the 
result below, if the matrix B is in F”*", then the matrix R is also in F””. 


7.63 Cholesky factorization 


Suppose B is a positive definite matrix. Then there exists a unique upper- 


triangular matrix R with only positive numbers on its diagonal such that 


B=R*R. 


Proof Because B is positive definite, there exists an invertible square matrix A 
of the same size as B such that B = A*A [by the equivalence of (a) and (f) in 
7.38]. 

Let A = QR be the QR factorization of A (see 7.58), where Q is unitary and R 
is upper triangular with only positive numbers on its diagonal. Then A* = R*Q*. 


Thus André-Louis Cholesky (1875-1918) 
_ AeA — PEN* _ pe discovered this factorization, which 
PS aS eee was published posthumously in 1924. 
as desired. 

To prove the uniqueness part of this result, suppose S is an upper-triangular 
matrix with only positive numbers on its diagonal and B = S*S. The matrix S is 
invertible because B is invertible (see Exercise 11 in Section 3D). Multiplying both 
sides of the equation B = S*S by S~! on the left gives the equation BS~! = S*. 

Let A be the matrix from the first paragraph of this proof. Then 


(AS-')*(AS-!) = (S*)-1A*AS71 
= (S*)-1BS-1 
= (S*)-15* 
=I. 


Thus AS"? is unitary. 

Hence A = (AS~')S is a factorization of A as the product of a unitary matrix 
and an upper-triangular matrix with only positive numbers on its diagonal. The 
uniqueness of the QR factorization, as stated in 7.58, now implies that S = R. 


In the first paragraph of the proof above, we could have chosen A to be the 
unique positive definite matrix that is a square root of B (see 7.39). However, 
the proof was presented with the more general choice of A because for specific 
positive definite matrices B, it may be easier to find a different choice of A. 
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Exercises 7D 


10 


11 


12 


Suppose dim V > 2 and S € £(V,W). Prove that S is an isometry if and 
only if Se,, Se, is an orthonormal list in W for every orthonormal list e,, e, 
of length two in V. 


Suppose T € £(V,W). Prove that T is a scalar multiple of an isometry if 
and only if T preserves orthogonality. 


The phrase “T preserves orthogonality” means that (Tu, Tv) = 0 for all 
u,v € V such that (u,v) = 0. 


(a) Show that the product of two unitary operators on V is a unitary operator. 
(b) Show that the inverse of a unitary operator on V is a unitary operator. 


This exercise shows that the set of unitary operators on V is a group, where 
the group operation is the usual product of two operators. 


Suppose F = C and A,B € Z£(V) are self-adjoint. Show that A + iB is 
unitary if and only if AB = BA and A? + B? = 1. 


Suppose S € £(V). Prove that the following are equivalent. 

(a) S is a self-adjoint unitary operator. 

(b) S = 2P —I for some orthogonal projection P on V. 

(c) There exists a subspace U of V such that Su = u for every u € U and 
Sw = —w for every w € Ut. 


Suppose T,, T> are both normal operators on F° with 2, 5,7 as eigenvalues. 
Prove that there exists a unitary operator S € £(F°) such that T, = S*T>S. 


Give an example of two self-adjoint operators T;,T> € £(F*) such that the 
eigenvalues of both operators are 2,5, 7 but there does not exist a unitary 
operator S € L(F*) such that T; = S*T,S. Be sure to explain why there is 
no unitary operator with the required property. 


Prove or give acounterexample: If S € £(V) and there exists an orthonormal 
basis e,, ...,€,, of V such that ||Se;|| = 1 for each e,, then S is a unitary operator. 


Suppose F = C and T € L(V). Suppose every eigenvalue of T has absolute 
value 1 and ||Tv|| < |lv|| for every v € V. Prove that T is a unitary operator. 


Suppose F = CandT € £(V) is aself-adjoint operator such that ||T|| < ||| 

for allv € V. 

(a) Show that I — T? is a positive operator. 

(b) Show that T + iVI — T? is a unitary operator. 

Suppose S € £(V). Prove that S is a unitary operator if and only if 
{Sv:v & Vand |lv|| < 1} = {v EV: lol] < 1}. 


Prove or give a counterexample: If S € L(V) is invertible and ||S~o|| = ||Sz] 
for every v € V, then S is unitary. 
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13 


14 


15 


16 


17 
18 


19 


20 


Explain why the columns of a square matrix of complex numbers form 
an orthonormal list in C” if and only if the rows of the matrix form an 
orthonormal list in C”. 


Suppose v € V with ||v|| = 1 and b € F. Also suppose dim V > 2. Prove 
that there exists a unitary operator S € £(V) such that (Sv, v) = b if and 
only if |b] < 1. 


Suppose T is a unitary operator on V such that T — I is invertible. 


(a) Prove that (T + I)(T —I)7! is a skew operator (meaning that it equals 
the negative of its adjoint). 
(b) Prove that if F = C, theni(T+1)(T —IT-' isa self-adjoint operator. 


The function z > i(z +1)(z —1)7! maps the unit circle in C (except for the 
point 1) toR. Thus (b) illustrates the analogy between the unitary operators 
and the unit circle in C, along with the analogy between the self-adjoint 
operators and R. 


Suppose F = C and T € L(V) is self-adjoint. Prove that (T + il) (T — il)71 
is a unitary operator and 1 is not an eigenvalue of this operator. 


Explain why the characterization of unitary matrices given by 7.57 holds. 


A square matrix A is called symmetric if it equals its transpose. Prove that if 
A is asymmetric matrix with real entries, then there exists a unitary matrix 
Q with real entries such that Q*AQ is a diagonal matrix. 


Suppose nis a positive integer. For this exercise, we adopt the notation that 
a typical element z of C” is denoted by z = (Zp, 2), ...,2Z,_1). Define linear 
functionals wo, Wy,...,W,,_1 on C” by 


1 n=l 


a —27rijm/n 
oe) ae Zim € ’ 
vn X, 


The discrete Fourier transform is the operator F : C” > C” defined by 


W;(Z, Zs ++ 


FZ = (Wo (Z), Wy (Z), +45 Wy _1(Z)). 
(a) Show that F is a unitary operator on C”. 
(b) Show that if (Zo, ...,Z,,_1) € C” and z,, is defined to equal zp, then 
Det eased) Se Say ee 


(c) Show that 4 = I. 


The discrete Fourier transform has many important applications in data 

analysis. The usual Fourier transform involves expressions of the form 
90) —27citx a ; 

So. fF (x)e dx for complex-valued integrable functions f defined on R. 


Suppose A is a square matrix with linearly independent columns. Prove that 
there exist unique matrices R and Q such that R is lower triangular with only 
positive numbers on its diagonal, Q is unitary, and A = RQ. 
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Singular Values 


We will need the following result in this section. 


7.64 properties of T*T 


Suppose T € £(V,W). Then 
(a) T*T is a positive operator on V; 


(b) null T*T = nullT; 


(c) range T*T = range T*; 


(d) dimrange T = dimrange T* = dim range T*T. 


Proof 
(a) We have 
(T*T)* = T*(T*)* = T*T. 
Thus T*T is self-adjoint. 
If v & V, then 
((T*T)v,v) = (T*(To),v) = (Tv, Tv) = ||Tal? > 0. 
Thus T*T is a positive operator. 


(b 


mn 


First suppose v € null T*T. Then 
(Pole = tT a. To) =H (T"To.0) = 0) = 0. 
Thus Tv = 0, proving that null T*T C null T. 


The inclusion in the other direction is clear, because if v € V and Tv = 0, 
then T*Tv = 0. 


Thus null T*T = null T, completing the proof of (b). 
(c) We already know from (a) that T*T is self-adjoint. Thus 
range T*T = (null rie = (nullT)+ = range T™*, 
where the first and last equalities come from 7.6 and the second equality 
comes from (b). 
(d 


wm 


To verify the first equation in (d), note that 
dim range T = dim(null T*)* = dim W — dimnull T* = dimrange T*, 


where the first equality comes from 7.6(d), the second equality comes from 
6.51, and the last equality comes from the fundamental theorem of linear 
maps (3.21). 


The equality dim range T* = dim range T*T follows from (c). 
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The eigenvalues of an operator tell us something about the behavior of the 
operator. Another collection of numbers, called the singular values, is also useful. 
Eigenspaces and the notation E (used in the examples) were defined in 5.52. 


7.65 definition: singular values 


Suppose T € L(V, W). The singular values of T are the nonnegative square 
roots of the eigenvalues of T*T, listed in decreasing order, each included as 
many times as the dimension of the corresponding eigenspace of T*T. 


7.66 example: singular values of an operator on F* 


Define T € L(F*) by T (21,22, 23,24) = (0,321,225, -3z4). A calculation 
shows that 


T*T (24, Zo, Z3,Z4) = (921,425, 0, 9Z4), 


as you should verify. Thus the standard basis of F* diagonalizes T*T, and we 
see that the eigenvalues of T*T are 9, 4, and 0. Also, the dimensions of the 
eigenspaces corresponding to the eigenvalues are 


dimE(9,T*T) =2 and dimE(4,T*T)=1 and dimE(0,T*T) =1. 


Taking nonnegative square roots of these eigenvalues of T*T and using dimension 
information from above, we conclude that the singular values of T are 3,3, 2,0. 

The only eigenvalues of T are —3 and 0. Thus in this case, the collection of 
eigenvalues did not pick up the number 2 that appears in the definition (and hence 
the behavior) of T, but the list of singular values does include 2. 


7.67 example: singular values of a linear map from F* to F° 


Suppose T € £(F*, F°) has matrix (with respect to the standard bases) 


00 0 -5 
000 0 |}. 
110 O 


You can verify that the matrix of T*T is 


Corr 
Corr 
So Oo :o 
OO. 


0 0 0 25 


and that the eigenvalues of the operator T*T are 25, 2,0, with dim E(25,T*T) = 1, 
dim E(2,T*T) = 1, and dimE(0,T*T) = 2. Thus the singular values of T are 


5, V2, 0,0. 


See Exercise 2 for a characterization of the positive singular values. 
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7.68 role of positive singular values 


Suppose that T € £(V,W). Then 


(a) Tis injective <=» 0 is not a singular value of T; 


(b) the number of positive singular values of T equals dim range T; 


(c) Tis surjective <—» number of positive singular values of T equals dim W. 


Proof The linear map T is injective if and only if null T = {0}, which happens 
if and only if null T*T = {0} [by 7.64(b)], which happens if and only if 0 is not 
an eigenvalue of T*T, which happens if and only if 0 is not a singular value of T, 
completing the proof of (a). 

The spectral theorem applied to T*T shows that dim range T*T equals the num- 
ber of positive eigenvalues of T*T (counting repetitions). Thus 7.64(c) implies 
that dim range T equals the number of positive singular values of T, proving (b). 

Use (b) and 2.39 to show that (c) holds. 


The table below compares eigenvalues with singular values. 


list of eigenvalues list of singular values 


context: vector spaces context: inner product spaces 


defined only for linear maps from a vector | defined for linear maps from an inner 
space to itself product space to a possibly different inner 
product space 


can be arbitrary real numbers (if F = R) | are nonnegative numbers 
or complex numbers (if F = C) 


can be the empty list if F = R length of list equals dimension of domain 


includes 0 <= operator is not invertible | includes 0 <= linear map is not injective 


no standard order, especially if F = C always listed in decreasing order 


The next result nicely characterizes isometries in terms of singular values. 


7.69 isometries characterized by having all singular values equal 1 


Suppose that S € £(V,W). Then 


Sis anisometry <= all singular values of S equal 1. 


Proof We have 
Sis anisometry — S*S=I 
<> all eigenvalues of S*S equal 1 


<=> all singular values of S equal 1, 


where the first equivalence comes from 7.49 and the second equivalence comes 
from the spectral theorem (7.29 or 7.31) applied to the self-adjoint operator S*S. 
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SVD for Linear Maps and for Matrices 


The next result shows that every linear 7), ipulop alu dec omposnonns 
map from V to W has a remarkably clean useful in computational linear alge- 
description in terms of its singular val- — pyq because good techniques exist for 
ues and orthonormal lists in V and W. — approximating eigenvalues and eigen- 
In the next section we will see several vectors of positive operators such as 
important applications of the singular T*T, whose eigenvalues and eigenvec- 
value decomposition (often called the tors lead to the singular value decom- 
SVD). position. 


7.70 singular value decomposition 


Suppose T € L(V,W) and the positive singular values of T are sj,...,s,,. 
Then there exist orthonormal lists e,,...,e,, in V and f,,..., f,,, in W such that 


ata LOS 84 (U, ei) fy pha ver oke Sin (V, te 


for every v € V. 


Proof Lets,,...,s,, denote the singular values of T (thus n = dim V). Because 
T*T is a positive operator [see 7.64(a)], the spectral theorem implies that there 
exists an orthonormal basis e,,...,e,, of V with 
72 T' Te =s73, 


for each k = 1,...,n. 
For each k = 1,...,m, let 


7.73 f=—. 


If j,k € {1,..., m}, then 
0 ifj#k, 


1 1 ‘ Sk 
a a ge Bee =, ifj =k. 


is J 
Thus f,,..., f,, is an orthonormal list in W. 
Ifk € {1,...,n} andk > m, then s, = 0 and hence T*Te, = 0 (by 7.72), which 
implies that Te, = 0 [by 7.64(b)]. 
Suppose v € V. Then 


To = T((0, ey ey ae (v, 225) 
= (0, €1)Tey + + (0, Cy) TC 
= 84(0,€1) fy sheer Si Oy Cn) Sons 


where the last index in the first line switched from 1 to m in the second line 
because Te, = 0 if k > m (as noted in the paragraph above) and the third line 
follows from 7.73. The equation above is our desired result. 
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Suppose T € £(V,W), the positive singular values of T are sj,...,s,,, and 
€1,++.,€,, and f;,..., f,, are as in the singular value decomposition 7.70. The 
orthonormal list e,, ..., e,,, can be extended to an orthonormal basis e,, ..., @gimy 
of V and the orthonormal list f,, ..., f,,, can be extended to an orthonormal basis 


fis faimw Of W. The formula 7.71 shows that 


ifl<k<m, 
0 ifm<k<dimV. 


Thus the matrix of T with respect to the orthonormal bases (e1,...,@gimy) and 
(fi, ++» faimw) has the simple form 


Sk 


ifl<j=k<m 
M (Ts (C151 €aimv)s (fis fam) ik = i ies a 
If dim V = dim W (as happens, for example, if W = V), then the matrix 
described in the paragraph above is a diagonal matrix. If we extend the definition 
of diagonal matrix as follows to apply to matrices that are not necessarily square, 
then we have proved the wonderful result that every linear map from V to W has 
a diagonal matrix with respect to appropriate orthonormal bases. 


7.74 definition: diagonal matrix 


An M-by-N matrix A is called a diagonal matrix if all entries of the matrix 
are 0 except possibly A; , fork = 1, ...,.min{M, N}. 


The table below compares the spectral theorem (7.29 and 7.31) with the 
singular value decomposition (7.70). 


spectral theorem 


singular value decomposition 


describes only self-adjoint operators 
(when F = R) or normal operators (when 
F=C) 


describes arbitrary linear maps from an 
inner product space to a possibly different 
inner product space 


produces a single orthonormal basis 


produces two orthonormal lists, one for 
domain space and one for range space, 
that are not necessarily the same even 
when range space equals domain space 


different proofs depending on whether 
F=RorF=C 


same proof works regardless of whether 
F=RorF=C 


The singular value decomposition gives us a new way to understand the adjoint 
and the inverse of a linear map. Specifically, the next result shows that given a 
singular value decomposition of a linear map T € £(V, W), we can obtain the 
adjoint of T simply by interchanging the roles of the e’s and the f’s (see 7.77). 
Similarly, we can obtain the pseudoinverse T’ (see 6.68) of T by interchanging 
the roles of the e’s and the f’s and replacing each positive singular value s, of T 
with 1/s, (see 7.78). 


Section 7E Singular Value Decomposition 275 


Recall that the pseudoinverse Tt in 7.78 below equals the inverse T~! if T is 
invertible [see 6.69(a)]. 


7.75 singular value decomposition of adjoint and pseudoinverse 


Suppose T € £(V,W) and the positive singular values of T are s4,...,5,,. 
Suppose e,,...,¢,, and f;,..., f,, are orthonormal lists in V and W such that 


7.76 TSC pte tS Os Creal fir 
for every v € V. Then 
lah T*w = 81(w, fey +22 + 5, (W, fin Ym 


and 


7.78 Tlw 


— ie) fee Hf (W fm) 
Sy Sin 


m 


for every w € W. 


Proof Ifve Vandw & W then 


(Tv, w) = (84 (0, ey) fy + + 89405 Cm) fins W) 


51 (0, €1)( fy, W) Ho + 8 (V, Ci) fins W) 


(0, 81(W, freq ae eee Bis fren) 


This implies that 
T*w = $1(W, fier a Sin (W, ters 


proving 7.77. 
To prove 7.78, suppose w € W. Let 


v= (ofa) foe + (Ws Fad 
Sy Sin 
Apply T to both sides of the equation above, getting 
Ww, W, m 
= fad re, er (fm) r6 
Sy Sin 


= (w, fidfi apt tech Ca an 


Tv 


m 


rangeT w, 


where the second line holds because 7.76 implies that Te, = s,f, if k = 1,...,m, 
and the last line above holds because 7.76 implies that f,, ..., f,, spans range T and 
thus is an orthonormal basis of range T [and hence 6.57(i) applies]. The equation 
above, the observation that v € (null T)+ [see Exercise 8(b)], and the definition 
of Ttw (see 6.68) show that v = TTw, proving 7.78. 
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7.79 example: finding a singular value decomposition 


Define T € £(F*, F?) by T(x1,%2,%3,X4) = (—5x4,0,x1 +X»). We want to 
find a singular value decomposition of T. The matrix of T (with respect to the 


standard bases) is 
000 -5 
| 000 0 } 
110 0 


Thus, as discussed in Example 7.67, the matrix of T*T is 


0 0 
0 0 
0 0 /7 
0 25 
and the positive eigenvalues of T*T are 25, 2, with dim E(25,T*T) = 1 and 
dim E(2,T*T) = 1. Hence the positive singular values of T are 5, V2. 

Thus to find a singular value decomposition of T, we must find an orthonormal 
list e,,e> in F* and an orthonormal list f,, f in F° such that 


Tv = div, ey ft a V2\0, eo) fo 


for all v € F*. 

An orthonormal basis of E(25, T*T) is the vector (0,0, 0,1); an orthonormal 
basis of E(2,T*T) is the vector ( ae 0, 0). Thus, following the proof of 7.70, 
we take a. 4 

e, = (0,0,0,1) and e= (=. —,.0,0) 
; Nal 
and 


Te, Te, 
fi = a = (—1,0,0) and ho = Va = (0,0,1). 


Then, as expected, we see that e,, e, is an orthonormal list in F* and f,, f, is an 
orthonormal list in F? and 


Tv = div, ey ft 2 V/2(0, eo) fo 


for all v € F*. Thus we have found a singular value decomposition of T. 


The next result translates the singular value decomposition from the context 
of linear maps to the context of matrices. Specifically, the following result gives 
a factorization of an arbitrary matrix as the product of three nice matrices. The 
proof gives an explicit construction of these three matrices in terms of the singular 
value decomposition. 

In the next result, the phrase “orthogonal columns” should be interpreted to 
mean that the columns are orthogonal with respect to the standard Euclidean inner 
product. 
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7.80 matrix version of SVD 


Suppose A is an M-by-n matrix of rank m > 1. Then there exist an M-by-m 
matrix B with orthonormal columns, an m-by-m diagonal matrix D with 


positive numbers on the diagonal, and an n-by-m matrix C with orthonormal 
columns such that 


A = BDC, 


Proof Let T: F” > F™ be the linear map whose matrix with respect to the 
standard bases equals A. Then dimrange T = m (by 3.78). Let 


7.81 Tv = 8,(0, €1) fy + + Sy (Vs Cm) fin 
be a singular value decomposition of T. Let 


B = the M-by-m matrix whose columns are f;,..., fins 
D = the m-by-m diagonal matrix whose diagonal entries are sj, ..., $,,,, 


C = the n-by-m matrix whose columns are é), ..., €,- 
Let u,,...,U,,, denote the standard basis of F”. If k € {1,...,m} then 
(AC — BD)u, = Ae; = B(S,uy) = Sk fe = Sif = 0. 


Thus AC = BD. 
Multiply both sides of this last equation by C* (the conjugate transpose of C) 
on the right to get 
ACC* = BDC*. 


Note that the rows of C* are the complex conjugates of e,,...,e,,.. Thus if 
k € {1,...,m}, then the definition of matrix multiplication shows that C*e, = u;,; 
hence CC*e, = e,. Thus ACC*v = Av for all v € span(ey,..., @,,)- 

If v & (span(e,...,,,)), then Av = 0 (as follows from 7.81) and C*v = 0 
(as follows from the definition of matrix multiplication). Hence ACC*v = Av for 
all v € (span(e,, say 

Because ACC* and A agree on span(e,, ...,@,,) and on (span(e;, ...,@,,))~. We 
conclude that ACC* = A. Thus the displayed equation above becomes 


A = BDC*, 


as desired. 


Note that the matrix A in the result above has Mn entries. In comparison, the 
matrices B, D, and C above have a total of 


m(M+m-+n) 


entries. Thus if M and n are large numbers and the rank m is considerably less 
than M and n, then the number of entries that must be stored on a computer to 
represent A is considerably less than Mn. 
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Exercises 7E 


Suppose T € Z(V,W). Show that T = 0 if and only if all singular values of 
T are 0. 


Suppose T € £(V,W) ands > 0. Prove that s is a singular value of T if and 
only if there exist nonzero vectors v € V and w € W such that 


Tv=sw and T*w=sv. 


The vectors v, w satisfying both equations above are called a Schmidt pair. 
Erhard Schmidt introduced the concept of singular values in 1907. 


Give an example of T € £(C*) such that 0 is the only eigenvalue of T and 
the singular values of T are 5, 0. 


Suppose that T € L(V, W), s, is the largest singular value of T, and s,, is 
the smallest singular value of T. Prove that 


{Toll : v0 € V and |lo|| = 1} = [s,,, 51]. 


Suppose T € L(C?) is defined by T(x,y) = (—4y,x). Find the singular 
values of T. 


Find the singular values of the differentiation operator D € L(P,(R)) 
defined by Dp = p’, where the inner product on 7, (R) is as in Example 6.34. 


Suppose that T € L(V) is self-adjoint or that F = C and T € L(V) is 
normal. Let A,,...,A,, be the eigenvalues of T, each included in this list 
as many times as the dimension of the corresponding eigenspace. Show 
that the singular values of T are |Aj|, ...,|A,,|, after these numbers have been 
sorted into decreasing order. 


Suppose T € £(V,W). Suppose s; > sy >-- >s,, > 0 and e,...,e,, is an 
orthonormal list in V and f,,..., f,,, is an orthonormal list in W such that 


Tv = 81(0, ey) fy +0 + Sy (05m) fin 


for every v € V. 


(a) Prove that f,,..., f,, is an orthonormal basis of range T. 

(b) Prove that e,,...,é,, is an orthonormal basis of (null T)+. 

(c) Prove that sj,..., 5, are the positive singular values of T. 

(d) Prove that ifk € {1,...,m}, then e, is an eigenvector of T*T with corre- 
sponding eigenvalue sf. 

(e) Prove that 


TT*w = sf, fidh ah shat Sy (W, Tel a 


for all w € W. 


10 


11 


12 


13 


14 


15 


16 


17 
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Suppose T € L(V, W). Show that T and T* have the same positive singular 
values. 


Suppose T € £(V,W) has singular values sj, ...,s,,. Prove that if T is an 
invertible linear map, then T~' has singular values 


1 1 

5, Ace 51 

Suppose that T € £(V, W) and 7, ...,v,, is an orthonormal basis of V. Let 
S1,---, 5, denote the singular values of T. 

(a) Prove that ||Tv,|? + --- + |Tv, |? = s?7 + -- +,2 

(b) Prove that if W = V and T is a positive operator, then 


(T01, 01) + + (T0,, Un) = $y Ho Sy. 
See the comment after Exercise 5 in Section 7A. 


(a) Give an example of a finite-dimensional vector space and an operator T 
on it such that the singular values of T? do not equal the squares of the 
singular values of T. 

(b) Suppose T € L(V) is normal. Prove that the singular values of T? 
equal the squares of the singular values of T. 


Suppose T,,T, € £(V). Prove that T,; and T, have the same singular 
values if and only if there exist unitary operators S,,5, € £(V) such that 
Ty = S1T>S>. 


Suppose T € £(V,W). Let s,, denote the smallest singular value of T. Prove 
that s,, lll < IT ull for every v € V. 


Suppose T € L(V) ands, > --- > s,, are the singular values of T. Prove 
that if A is an eigenvalue of T, then s,; > |A| > s,,. 


Suppose T € £(V, W). Prove that (r*)! = (TT). 
Compare the result in this exercise to the analogous result for invertible 


linear maps [see 7.5(f)]. 


Suppose T € L(V). Prove that T is self-adjoint if and only if T! is self- 
adjoint. 


Matrices unfold 
Singular values gleam like stars 
Order in chaos shines 


—written by ChatGPT with input haiku about SVD 
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7F Consequences of Singular Value Decomposition 


Norms of Linear Maps 


The singular value decomposition leads to the following upper bound for ||Tv]|. 


7.82 upper bound for |\To|| 


Suppose T € L(V, W). Let s, be the largest singular value of T. Then 


Tall < sll 


for allv € V. 


Proof Let sy,...,s,, denote the positive 
singular values of T, and let e,...,e,, be 
an orthonormal list in V and f,,..., f,,, be 
an orthonormal list in W that provide a singular value decomposition of T. Thus 


For a lower bound on |\To\|, look at 
Exercise 14 in Section 7E. 


7.83 To = 81(v, 1) fy + + Sins Cm) Sn 
for all v € V. Hence if v € V then 
Tol? = sP (oe)? + + Sie KO End 

= s? (Io, es) te + |v, endl) 

< sf lal, 
where the last inequality follows from Bessel’s inequality (6.26). Taking square 
roots of both sides of the inequality above shows that ||Tv|| < s;|lv|l, as desired. 

Suppose T € L(V, W) and s, is the largest singular value of T. The result 

above shows that 


7.84 Tvl < s, for all v € V with |lo|| < 1. 


Taking v = e, in 7.83 shows that Te, = s, f,;. Because || f;|| = 1, this implies that 
Te, || = s,. Thus because |le,|| = 1, the inequality in 7.84 leads to the equation 


7.85 max{||To|| : v € V and ||| < 1} = 4. 


The equation above is the motivation for the following definition, which defines 
the norm of T to be the left side of the equation above without needing to refer to 
singular values or the singular value decomposition. 


7.86 definition: norm of a linear map, |\- || 


Suppose T € £(V,W). Then the norm of T, denoted by ||T||, is defined by 


T|| = max{|To|| : v € V and |o|| < 1}. 
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In general, the maximum of an infinite set of nonnegative numbers need 
not exist. However, the discussion before 7.86 shows that the maximum in the 
definition of the norm of a linear map T from V to W does indeed exist (and equals 
the largest singular value of T). 

We now have two different uses of the word norm and the notation ||- ||. Our 
first use of this notation was in connection with an inner product on V, when we 
defined ||v|| = \/(v, v) for each v € V. Our second use of the norm notation and 
terminology is with the definition we just made of ||T|| for T € 2(V,W). The 
norm ||T|| for T € Z(V,W) does not usually come from taking an inner product 
of T with itself (see Exercise 21). You should be able to tell from the context and 
from the symbols used which meaning of the norm is intended. 

The properties of the norm on £(V, W) listed below look identical to properties 
of the norm on an inner product space (see 6.9 and 6.17). The inequality in (d) is 
called the triangle inequality, thus using the same terminology that we used for 
the norm on V. For the reverse triangle inequality, see Exercise 1. 


7.87 basic properties of norms of linear maps 


Suppose T € £(V,W). Then 
(a) [ITI] = 0; 


(b) ||T] =0 — T=0; 
(c) ||AT|| = JA |IT|| for all A € F; 
(d) |S + TI] < ||S|| + ||TIl for all S E L(V, W). 


Proof 
(a) Because ||Tv|| > 0 for every v € V, the definition of ||T|| implies that ||T|| > 0. 
(b) Suppose ||T|| = 0. Thus Tv = 0 for all v € V with |lv|| < 1. If u © V with 


u # 0, then u 
Tu = |lul (=) = 0, 


where the last equality holds because 1/||u|| has norm 1. Because Tu = 0 for 
all u € V, we have T = 0. 


Conversely, if T = 0 then Tv = 0 for all v € V and hence ||T|| = 0. 
(c) Suppose A € F. Then 
AT || = max{||ATo|| : v © V and |v|| < 1} 
= |A| max{||To|| : v € V and |lol| < 1} 
= |ANIIT I. 


(d 


wm 


Suppose S € £(V,W). The definition of ||S + T|| implies that there exists 
v € Vsuch that |lv|| < 1 and ||S + T|| = ||(S + T)o||. Now 


IS + Tl] = ||(S + T)ol] = So + Toll < Soll + \IToll < ISI + (ITI, 
completing the proof of (d). 
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For S,T € £(V,W), the quantity ||S — T|| is often called the distance between 
S and T. Informally, think of the condition that ||S — T|| is a small number as 
meaning that S and T are close together. For example, Exercise 9 asserts that for 
every T € L(V), there is an invertible operator as close to T as we wish. 


7.88 alternative formulas for ||T|| 


Suppose T € £(V,W). Then 


(a) ||T|| = the largest singular value of T; 
(b) ||T\| = max{||To|| : v € V and |lu|| = 1}; 


(c) ||T|| = the smallest number c such that ||Tv|| < cllv|| for all v € V. 


Proof 

(a) See 7.85. 

(b) Let v € V be such that 0 < |/v|| < 1. Let u = v/|\v||. Then 
)|- Toll 

Iloll ‘loll. 


Thus when finding the maximum of ||Tv|| with ||v|| < 1, we can restrict 
attention to vectors in V with norm 1, proving (b). 


lal] = |— |=. and {Tull = [r= > Toll. 
[al 


(c) Suppose v € V and v $ 0. Then the definition of ||T|| implies that 
bplsim 
jal 


which implies that 


7.89 Toll < ITI ll. 


Now suppose c > 0 and ||Tz|| < cllv|| for all v € V. This implies that 
Toll < c 


for all v € V with ||v|| < 1. Taking the maximum of the left side of the 
inequality above over all v € V with |u|] < 1 shows that ||T|| < c. Thus ||T|| is 
the smallest number c such that ||Tv|| < cllv|| for all v € V. 


When working with norms of linear maps, you will probably frequently use 
the inequality 7.89. 

For computing an approximation of the norm of a linear map T given the 
matrix of T with respect to some orthonormal bases, 7.88(a) is likely to be most 
useful. The matrix of T*T is quickly computable from matrix multiplication. 
Then a computer can be asked to find an approximation for the largest eigenvalue 
of T*T (excellent numeric algorithms exist for this purpose). Then taking the 
square root and using 7.88(a) gives an approximation for the norm of T (which 
usually cannot be computed exactly). 
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You should verify all assertions in the example below. 


7.90 example: norms 


e If I denotes the usual identity operator on V, then ||J|| = 1. 


e If T © L(F”) and the matrix of T with respect to the standard basis of F” 
consists of all 1’s, then ||T|| = 1. 


If T © L(V) and V has an orthonormal basis consisting of eigenvectors of 
T with corresponding eigenvalues A,,..., A,,, then ||T'|| is the maximum of the 
numbers |A,|, ..., |A,,|. 


Suppose T € L(R°) is the operator whose matrix (with respect to the stan- 
dard basis) is the 5-by-5 matrix whose entry in row j, column k is 1/ (j? +k). 
Standard mathematical software shows that the largest singular value of T is 
approximately 0.8 and the smallest singular value of T is approximately 10~°. 
Thus ||T|| ~ 0.8 and (using Exercise 10 in Section 7E) ||T~+|| ~ 10°. It is not 
possible to find exact formulas for these norms. 


A linear map and its adjoint have the same norm, as shown by the next result. 


7.91 norm of the adjoint 


Suppose T € £(V, W). Then ||T*|| = IT. 


Proof Suppose w € W. Then 
|T*w|? = (T*w, T*w) = (TT*w, w) < |TT*e| lheoll < ITI |T*o]| Il. 

The inequality above implies that 
[T*col| < ITI eo, 


which along with 7.88(c) implies that ||T*|| < ||T|. 
Replacing T with T* in the inequality ||T*|| < ||T|| and then using the equation 
(T*)* = T shows that ||T\| < ||T*||. Thus ||T*|| = ||T||, as desired. 


You may want to construct an alternative proof of the result above using 
Exercise 9 in Section 7E, which asserts that a linear map and its adjoint have the 
same positive singular values. 


Approximation by Linear Maps with Lower-Dimensional Range 


The next result is a spectacular application of the singular value decomposition. 
It says that to best approximate a linear map by a linear map whose range has 
dimension at most k, chop off the singular value decomposition after the first 
k terms. Specifically, the linear map T;, in the next result has the property that 
dim range T, = k and T, minimizes the distance to T among all linear maps with 
range of dimension at most k. This result leads to algorithms for compressing 
huge matrices while preserving their most important information. 
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7.92 best approximation by linear map whose range has dimension < k 


Suppose T € L(V, W) and s, > --- >s,, are the positive singular values of T. 
Suppose 1 < k < m. Then 


min{||T — S|: S$ € L(V,W) and dimrangeS < k} = 5,44. 


Furthermore, if 
fo = 81 (V, ey) ft ap cco ap GO) Cie 


is a singular value decomposition of T and T, € Z(V, W) is defined by 
Tv = 84(0, €1) fy + ++ + 8440, ey) fr 


for each v € V, then dim range T;, = k and ||T — T,.|| = s;,4. 


Proof Ifv € V then 
\(T = Tall” a IIS. (2, Cxsadtked pe Sin (Vs etal 
= Sei PMOeear dl + + Sy? Os C nd? 


2 2 
< Spa P(MOseear dl +o + Oem!) 
S Sp 4 c Ill. 


Thus ||T — T,ll < s,,1. The equation (T — T,)e.41 = Sp¢41f¢41 now shows that 
IT — Tl = Sk41- 

Suppose S € £(V,W) and dimrangeS < k. Thus Se,,..., Se, ,,, which is a 
list of length k + 1, is linearly dependent. Hence there exist a, ...,a;,4, © F, not 
all 0, such that 

A,Sey + + Ap See, = 0. 
Now 41€] + ++ + Ap4.1€e41 # 0 because aj,...,4,,, are not all 0. We have 


2 2 
I(T — S)(ayey +o + gy eee all = Pe + + ae eee Dh 
= 2 
= Is1ay fr +o + Sparta feral 
24, 12 2 2 
= Sf lal + + Spat Me sal 
2/10 12 2 
> 8x47 (lal? ++ + lege al) 
= 2 2 
= Seat llayey ++ + A 1K aril? 
Because @1€, + +++ + Ap41€¢41 # 0, the inequality above implies that 
IT — Sl] > Sp44- 
Thus S = T, minimizes ||T — S|| among S € £(V, W) with dimrangeS < k. 
For other examples of the use of the singular value decomposition in best 
approximation, see Exercise 22, which finds a subspace of given dimension on 


which the restriction of a linear map is as small as possible, and Exercise 27, 
which finds a unitary operator that is as close as possible to a given operator. 
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Polar Decomposition 


Recall our discussion before 7.54 of the analogy between complex numbers z 
with |z| = 1 and unitary operators. Continuing with this analogy, note that every 
complex number z except 0 can be written in the form 


where the first factor, namely, z/|z|, has absolute value 1. 

Our analogy leads us to guess that every operator T € £(V) can be written as 
a unitary operator times VT*T. That guess is indeed correct. The corresponding 
result is called the polar decomposition, which gives a beautiful description of an 
arbitrary operator on V. 

Note that if T € L(V), then T*T is a positive operator [as was shown in 


7.64(a)]. Thus the operator VT*T makes sense and is well defined as a positive 
operator on V. 

The polar decomposition that we are about to state and prove says that every 
operator on V is the product of a unitary operator and a positive operator. Thus 
we can write an arbitrary operator on V as the product of two nice operators, 
each of which comes from a class that we can completely describe and that we 
understand reasonably well. The unitary operators are described by 7.55 if F = C; 
the positive operators are described by the real and complex spectral theorems 
(7.29 and 7.31). 

Specifically, consider the case F = C, and suppose 


T = SVT*T 


is a polar decomposition of an operator T € £(V), where S is a unitary operator. 
Then there is an orthonormal basis of V with respect to which S has a diagonal 


matrix, and there is an orthonormal basis of V with respect to which VT*T has 
a diagonal matrix. Warning: There may not exist an orthonormal basis that 
simultaneously puts the matrices of both S and VT*T into these nice diagonal 
forms—S may require one orthonormal basis and VT*T may require a different 
orthonormal basis. 

However (still assuming that F = C), if T is normal, then an orthonormal 
basis of V can be chosen such that both S and VT*T have diagonal matrices with 
respect to this basis—see Exercise 31. The converse is also true: If T € £(V) 
and T = SVT*T for some unitary operator S € £(V) such that S and VT*T both 
have diagonal matrices with respect to the same orthonormal basis of V, then T 
is normal. This holds because T then has a diagonal matrix with respect to this 
same orthonormal basis, which implies that T is normal [by the equivalence of 
(c) and (a) in 7.31]. 
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The polar decomposition below is valid on both real and complex inner product 
spaces and for all operators on those spaces. 


7.93 polar decomposition 


Suppose T € £(V). Then there exists a unitary operator S € £(V) such that 


Pe svi, 


Proof Let sq,...,s,, be the positive singular values of T, and let e,,...,e,,, and 
fis-+s fin be orthonormal lists in V such that 


7.94 Tv = 81(U, €1) fy +0 + Sy (0, Cn) fin 


for every v € V. Extend e,,...,e,, and f;,..., f,,, to orthonormal bases e¢),..., e,, 
and f,,.... f,, of V. 
Define S € L(V) by 


Sv = (0,€1) fy to + (0,ey) fi, 
for each v € V. Then 
Sol? = Ko, e,) fy + + + (Den) fall” 
= |(o,e)P +--+ Koel 


= |loll?. 


Thus S is a unitary operator. 
Applying T* to both sides of 7.94 and then using the formula for T* given by 
7.77 shows that 
T*Tv = 87 (0, €,)€y + + + $4240, Cn Cn 
for every v € V. Thus if v € V, then 
VT*To = 84(0, €1)€y + 20 + 84, (0, Cy Cyn 
because the operator that sends v to the right side of the equation above is a 
positive operator whose square equals T*T. Now 


SVT*Tv = S(s4(0, €1)€y + 21 + Si (V, Cy Cm) 


= 81(0,€1) fy to + Sy (O5 Cm) fin 
— Tv, 


where the last equation follows from 7.94. 


Exercise 27 shows that the unitary operator S produced in the proof above is 
as close as a unitary operator can be to T. 

Alternative proofs of the polar decomposition directly use the spectral theorem, 
avoiding the singular value decomposition. However, the proof above seems 
cleaner than those alternative proofs. 
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Operators Applied to Ellipsoids and Parallelepipeds 


The bail in V of radius 1 centered at 0, denoted by B, is defined by 


B={vEV: [ul <1}. 


If dim V = 2, the word disk is sometimes used instead of 
ball. However, using ball in all dimensions is less confusing. 
Similarly, if dim V = 2, then the word ellipse is sometimes 
used instead of the word ellipsoid that we are about to define. 
Again, using ellipsoid in all dimensions is less confusing. 

You can think of the ellipsoid defined below as obtained 
by starting with the ball B and then stretching by a factor of 
s, along each f;, axis. The ball B in R?. 


Suppose that f,,..., f,, is an orthonormal basis of V and sj, ...,s,, are positive 
numbers. The ellipsoid E(s, fy, ...,$,f,,) with principal axes s, fy, ...,5,f,, is 
defined by 


2 2 
Kv, fr yI a _, Ke Fadl <1}. 


Be SF 


E (Sniper = {v EV: 


The ellipsoid notation E(s, f;,...,s,,f,,) does not explicitly include the inner 
product space V, even though the definition above depends on V. However, the in- 
ner product space V should be clear from the context and also from the requirement 
that f,,..., f,, be an orthonormal basis of V. 


7.97 example: ellipsoids 


v2 


The ellipsoid E(2f,, fy) in R’, where The ellipsoid E(2f,, fy) in R2, where 


, fo is the standard basis of R?. -(1L 1 -(-1 1 
fil of fi Ga and fy ( eine 
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The ellipsoid 
E(4f,,3f9,2fs) in RS, 
where f,, fo, fz is the 
standard basis of R°. 


The ellipsoid E(f,,..., f,,) equals the ball B in V for every orthonormal basis 
fi, +» fy, of V [by Parseval’s identity 6.30(b)]. 


For T a function defined on V and Q C V, define T(Q) by 
TQ) = {Tv: 0 € O}. 


Thus if T is a function defined on V, then T(V) = range T. 

The next result states that every invertible operator T € £(V) maps the ball 
B in V onto an ellipsoid in V. The proof shows that the principal axes of this 
ellipsoid come from the singular value decomposition of T. 


Proof Suppose T has singular value decomposition 


7.100 Tv = 81(v, 1) fy to +5, (0, €,) fy, 


for allv € V, where s,,...,s,, are the singular values of T ande,,...,e, and f,,..., f, 
are both orthonormal bases of V. We will show that T(B) = E(s, f,,...,5,f,)- 

First suppose v € B. Because T is invertible, none of the singular values 
Sy, -.-55, equals 0 (see 7.68). Thus 7.100 implies that 


2 2 
(Te fOP , , MTe.fo 


5 = |(0,e,)7 +++ + |, e,)F <1. 
St 


n 
Thus Tv € E(s,f,,....8,f,,). Hence T(B) C E(s, fy, ..5Syfy)- 
To prove inclusion in the other direction, now suppose w € E(s1 fi, ...5 5, fy): 


GOTT 5 ssc Og 

51 Sy 
Then ||o|| < 1 and 7.100 implies that Tv = (w, f,)f, +--+ + (Ww, fx) f, = w. Thus 
T(B) DEG fis ns8y fh): 


Let 


n* 
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We now use the previous result to show that invertible operators take all 
ellipsoids, not just the ball of radius 1, to ellipsoids. 


7.101 invertible operator takes ellipsoids to ellipsoids 


Suppose T € L(V) is invertible and E is an ellipsoid in V. Then T(E) is an 
ellipsoid in V. 


Proof There exist orthonormal basis f,, ..., f,, of V and positive numbers sy, ..., 5, 
such that E = E(s, fy, ...,5,,f,,). Define S € £(V) by 


S(ayf, te + Oy f,) = 451 f, to + 4,5, fy 
Then S maps the ball B of V onto E, as you can verify. Thus 
T(E) = T(S(B)) = (TS)(B). 
The equation above and 7.99, applied to TS, show that T(E) is an ellipsoid in V. 


Recall (see 3.95) that if u € V and O C V then u + CO is defined by 
u+t+Q={u+wi:weQO}. 


Geometrically, the sets Q and u + O look the same, but they are in different 
locations. 

In the following definition, if dim V = 2 then the word parallelogram is often 
used instead of parallelepiped. 


7.102 definition: P(v,, ...,v,,), parallelepiped 


Suppose v,,...,,, is a basis of V. Let 
Ores i= AG Oy ee Ons a pene ae al) te 


A parallelepiped is a set of the form u + P(v,,...,v,,) for some u € V. The 
vectors Uj,...,V,, are called the edges of this parallelepiped. 


7.103 example: parallelepipeds 
1.5 


0.5 
M7 


0.3 13 2.3 
The parallelepiped A parallelepiped in R°. 
(0.3, 0.5) + P((1,0), (1,1)) in R?. 
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7.104 invertible operator takes parallelepipeds to parallelepipeds 


Suppose u € V and 7,..., v,, is a basis of V. Suppose T € £(V) is invertible. 


Then 


Te PO v5 Op) S UE PUD coer, UO). 


Proof Because T is invertible, the list Tv,,..., Tv,, is a basis of V. The linearity 
of T implies that 


TU +a,0; +++ +4,0,) = Tut+a,Tv, ++ +4,T0, 
for all a,,...,a,, € (0,1). Thus T(u + P(a,...,0,)) = Tu + P(T),...,Tv,). 
Just as the rectangles are distinguished among the parallelograms in R?, we 


give a special name to the parallelepipeds in V whose defining edges are orthogonal 
to each other. 


7.105 definition: box 


A box in V is a set of the form 
UP (ta Cia esl na) 


where u € Vand1r,...,17,, are positive numbers and ¢,, ..., e,, is an orthonormal 
basis of V. 


Note that in the special case of R* each box is a rectangle, but the terminology 
box can be used in all dimensions. 


7.106 example: boxes 


Qt 


V2e V2e, 


1 2 2, | 
The box (1,0) + P( V2e1, V2e>), where The box P(e,, 2e),e3), where €1, ep, 3 
ey efi cdl is the standard basis of R°. 
= (ise) and e, = (~~, =). is the standard basis of 
Suppose T € L(V) is invertible. Then T maps every parallelepiped in V 
to a parallelepiped in V (by 7.104). In particular, T maps every box in V to a 
parallelepiped in V. This raises the question of whether T maps some boxes in 
V to boxes in V. The following result answers this question, with the help of the 
singular value decomposition. 
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7.107 every invertible operator takes some boxes to boxes 


Suppose T € £(V) is invertible. Suppose T has singular value decomposition 


Tv = $1 (U, C1) fy ap 208 Se 8,40, Cine 


where sj,...,5,, are the singular values of T and e,,...,e, and f,,..., f,, are 
orthonormal bases of V and the equation above holds for all v € V. Then T 
maps the box u + P(1r1e1,...,1,€,) onto the box Tu + P(1181 fy... TnSnfy) for 
all positive numbers r,,...,7,, and all u € V. 


Proof Ifa,,...,a, € (0,1) and 1r,,...,7,, are positive numbers and u € V, then 


T(uU+a,re, t+ +4,17,€,) = Tu + a4r18, fy, t+ + 4,18, fy 


Thus T(u + P(11 ey, ...,%,€,)) = Tu + P1151 fy es My Sn fn): 


Volume via Singular Values 


Our goal in this subsection is to understand how an operator changes the volume 
of subsets of its domain. Because notions of volume belong to analysis rather 
than to linear algebra, we will work only with an intuitive notion of volume. Our 
intuitive approach to volume can be converted into appropriate correct definitions, 
correct statements, and correct proofs using the machinery of analysis. 

Our intuition about volume works best in real inner product spaces. Thus the 
assumption that F = R will appear frequently in the rest of this subsection. 

If dimV = n, then by volume we will mean n-dimensional volume. You 
should be familiar with this concept in R°. When n = 2, this is usually called area 
instead of volume, but for consistency we use the word volume in all dimensions. 
The most fundamental intuition about volume is that the volume of a box (whose 
defining edges are by definition orthogonal to each other) is the product of the 
lengths of the defining edges. Thus we make the following definition. 


7.108 definition: volume of a box 


Suppose F = R. If u € V and 1y,...,7,, are positive numbers and e),...,@,, is 
an orthonormal basis of V, then 


volume (u + P (11), ..45%€n)) = 1 Xo XT ye 


The definition above agrees with the familiar formulas for the area (which we 
are calling the volume) of a rectangle in R? and for the volume of a box in R°. For 
example, the first box in Example 7.106 has two-dimensional volume (or area) 2 
because the defining edges of that box have length 2 and V2. The second box 
in Example 7.106 has three-dimensional volume 2 because the defining edges of 
that box have length 1, 2, and 1. 
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To define the volume of a subset of V, approximate the 
subset by a finite collection of disjoint boxes, and then add up 
the volumes of the approximating collection of boxes. As we 
approximate a subset of V more accurately by disjoint unions 
of more boxes, we get a better approximation to the volume. == 

These ideas should remind you of how the Riemann integral — Volume of this 
is defined by approximating the area under a curve by a disjoint ball ~ sum of the 
collection of rectangles. This discussion leads to the following volumes of the 
nonrigorous but intuitive definition. five boxes. 


7.109 definition: volume 


Suppose F = R and C V. Then the volume of O, denoted by volume Q, is 


approximately the sum of the volumes of a collection of disjoint boxes that 
approximate ©. 


We are ignoring many reasonable questions by taking an intuitive approach to 
volume. For example, if we approximate Q by boxes with respect to one basis, 
do we get the same volume if we approximate Q by boxes with respect to a 
different basis? If Q, and Q, are disjoint subsets of V, is volume(Q, UQ,) = 
volume ©, + volume ,? Provided that we consider only reasonably nice subsets 
of V, techniques of analysis show that both these questions have affirmative 
answers that agree with our intuition about volume. 


7.110 example: volume change by a linear map 


Suppose that T € L(R*) is defined by 
Tv = 2(0, €1)e, + (0, €y)€p, where e,, ey is the 
standard basis of R?. This linear map stretches 
by a factor of 2 along the e, axis. The ball 
approximated by five boxes above gets mapped 
by T to the ellipsoid shown here. Each of the 
five boxes in the original figure gets mapped to Each box here has twice the width 
a box of twice the width and the same height and the same height as the boxes in 
as in the original figure. Hence each box gets the previous figure. 
mapped to a box of twice the volume (area) as in the original figure. The sum 
of the volumes of the five new boxes approximates the volume of the ellipsoid. 
Thus T changes the volume of the ball by a factor of 2. 


In the example above, T maps boxes with respect to the basis e;, e, to boxes 
with respect to the same basis; thus we can see how T changes volume. In general, 
an operator maps boxes to parallelepipeds that are not boxes. However, if we 
choose the right basis (coming from the singular value decomposition!), then 
boxes with respect to that basis get mapped to boxes with respect to a possibly 
different basis, as shown in 7.107. This observation leads to a natural proof of 
the following result. 
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7.111 volume changes by a factor of the product of the singular values 


Suppose F = R, T € L(V) is invertible, and QO C V. Then 


volume T(Q.) = (product of singular values of T) (volume Q). 


Proof Suppose T has singular value decomposition 
Tv = 81(0,€1) fy +o +5, (0,€,) fr 


for all v € V, where ey,...,e,, and f,,..., f,, are orthonormal bases of V. 

Approximate ©. by boxes of the form u + P(r ,ey,...,1,€,), Which have volume 
1 X ++ Xx r,. The operator T maps each box u + P(11e;,...,7%,€,) onto the box 
Tu + P1181 fy, +5 MS fy), Which has volume (s; x ++ x $,,)(1q X + X Ty). 

The operator T maps a collection of boxes that approximate Q onto a collection 
of boxes that approximate T(Q). Because T changes the volume of each box in a 
collection that approximates by a factor of s; x --- x s,,, the linear map T changes 
the volume of O. by the same factor. 


Suppose T € £(V). As we will see when we get to determinants, the product 
of the singular values of T equals |det T|; see 9.60 and 9.61. 


Properties of an Operator as Determined by Its Eigenvalues 


We conclude this chapter by presenting the table below. The context of this 
table is a finite-dimensional complex inner product space. The first column of 
the table shows a property that a normal operator on such a space might have. 
The second column of the table shows a subset of C such that the operator has 
the corresponding property if and only if all eigenvalues of the operator lie in 
the specified subset. For example, the first row of the table states that a normal 
operator is invertible if and only if all its eigenvalues are nonzero (this first row 
is the only one in the table that does not need the hypothesis that the operator is 
normal). 

Make sure you can explain why all results in the table hold. For example, 
the last row of the table holds because the norm of an operator equals its largest 
singular value (by 7.85) and the singular values of a normal operator, assuming 
F = C, equal the absolute values of the eigenvalues (by Exercise 7 in Section 7E). 


properties of a normal operator eigenvalues are contained in 


invertible C\{0} 

self-adjoint R 

skew {A EC: Red = 0} 
orthogonal projection {0, 1} 

positive [0, oo) 

unitary {AEC:|A|=1} 
norm is less than 1 {AEC: [Al <1} 
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Exercises 7F 


10 


11 


12 
13 


14 


Prove that if S,T € L(V, W), then | |S|| — TIl| < |S — TI. 


The inequality above is called the reverse triangle inequality. 


Suppose that T € L(V) is self-adjoint or that F = C and T € L(V) is 
normal. Prove that 


T|| = max{|A| : A is an eigenvalue of T}. 
Suppose T € £(V,W) and v € V. Prove that 
Toll = ITI ol <> T*To = ITIP. 


Suppose T € £(V,W),v € V, and ||To|| = ||T|| |Ilv|l. Prove that if wu E V and 
(u,v) = 0, then (Tu, Tv) = 0. 


Suppose U is a finite-dimensional inner product space, T € £(V, U), and 
S © L(U, W). Prove that 


IST || < ISIITI. 
Prove or give a counterexample: If S,T € £(V), then ||ST|| = ||TS||. 


Show that defining d(S,T) = ||S — T|| for S,T © Z(V, W) makes d a metric 
on £(V,W). 

This exercise is intended for readers who are familiar with metric spaces. 
(a) Prove that if T © Z(V) and |\J — T|| < 1, then T is invertible. 


(b) Suppose that S € L(V) is invertible. Prove that if T € L(V) and 
|S — Tl < 1/||S~||, then T is invertible. 


This exercise shows that the set of invertible operators in L(V) is an open 
subset of £(V), using the metric defined in Exercise 7. 


Suppose T € £(V). Prove that for every € > 0, there exists an invertible 
operator S € £(V) such that 0 < ||T — S| < e. 


Suppose dim V > 1 and T € L(V) is not invertible. Prove that for every 
€ > O, there exists S € L(V) such that 0 < ||T — S|| < e€ and S is not 
invertible. 


Suppose F = C and T € L(V). Prove that for every € > 0 there exists a 
diagonalizable operator S € £(V) such that 0 < ||T — S|| < e. 


Suppose T € L(V) is a positive operator. Show that ||VT | = (TI. 
Suppose S,T € £(V) are positive operators. Show that 
|S — Ti < max{I|SI, ITI} < IS + TI. 


Suppose U and W are subspaces of V such that ||P,; — Py]|l < 1. Prove that 
dim U = dim W. 


15 


16 


17 


18 


19 


20 


21 


22 


23 


24 


Section 7F Consequences of Singular Value Decomposition 295 


Define T € L(F°) by 
T (21, 29,23) = (Z3, 221,322). 
Find (explicitly) a unitary operator S € £(F°) such that T = SVT*T. 


Suppose S € £(V) is a positive invertible operator. Prove that there exists 
56 > 0 such that T is a positive operator for every self-adjoint operator 
T € £(V) with ||S — T| < 6. 


Prove that if u € V and q,, is the linear functional on V defined by the 
equation @,,(v) = (0, u), then lp, | = lull. 
Here we are thinking of the scalar field F as an inner product space with 
(a, B) = ap for all a,B € F. Thus \|9,,\| means the norm of @,, as a linear 
map from V to F. 
Suppose e,,...,é,, is an orthonormal basis of V and T € £(V,W). 
(a) Prove that max{||Te;||, ...llTe,ll} < ITI < (Tel? + «= + Te, lI2) "7. 
(b) Prove that ||Tl] = (Te; |? +--- +l|Te,|2) 7 if and only if dim range T < 1. 


Here ey, ...,,, is an arbitrary orthonormal basis of V, not necessarily con- 
nected with a singular value decomposition of T. If s1,...,8, is the list 
of singular values of T, then the right side of the inequality above equals 
(s2 ++ +s,2)'* as was shown in Exercise 11(a) in Section 7E. 


Prove that if T € L(V, W), then ||T*T|| = |||. 
This formula for \|T*T|| leads to the important subject of C*-algebras. 


Suppose T € L(V) is normal. Prove that ||T*|| = ||T\\* for every positive 
integer k. 


Suppose dim V > 1 and dim W > 1. Prove that the norm on £(V, W) does 
not come from an inner product. In other words, prove that there does not 
exist an inner product on £(V,W) such that 


max{||To|| : v € V and |lvl| < 1} = \/(T,T) 
for all T € L(V,W). 


Suppose T € £(V,W). Let n = dim V and let s,; > --- > s,, denote the 
singular values of T. Prove that if 1 < k < n, then 


min{||T|,;l| : Wis a subspace of V with dim U =k} =s,,_ 444. 


Suppose T € £(V,W). Show that T is uniformly continuous with respect 
to the metrics on V and W that arise from the norms on those spaces (see 
Exercise 23 in Section 6B). 


Suppose T € £(V) is invertible. Prove that 


T 
T= ITI"? = iT is a unitary operator. 


296 


25 


26 


27 


28 


29 


30 


31 


32 


Chapter 7 Operators on Inner Product Spaces 


Fix u,x € V with u # 0. Define T € £(V) by Tv = (v, u)x for every v € V. 
Prove that 


VT*Tv = aren u)yu 


lull 
for every v € V. 


Suppose T € £(V). Prove that T is invertible if and only if there exists a 
unique unitary operator S € £(V) such that T = SVT*T. 


Suppose T € £(V) and sj,...,s, are the singular values of T. Let e,,..., e, 
and f,,..., f,, be orthonormal bases of V such that 
Tv = 81 (0, €1) fy +0 + 8,(0, En) fn 


for all v € V. Define S € L(V) by 


Sv = (v,€1) fy to + (0,€,) fre 
(a) Show that S is unitary and ||T — S|| = max{|s, — 1I,..., |s,, — 1]}. 
(b) Show that if E © Z(V) is unitary, then ||T — E|| > ||T — Sl. 


This exercise finds a unitary operator S that is as close as possible (among 
the unitary operators) to a given operator T. 


Suppose T € £(V). Prove that there exists a unitary operator S € £(V) 
such that T = VTT*S. 


Suppose T € £(V). 

(a) Use the polar decomposition to show that there exists a unitary operator 
S € £(V) such that TT* = ST*TS*. 

(b) Show how (a) implies that T and T* have the same singular values. 


Suppose T € £(V), S € L(V) is a unitary operator, and R € L(V) isa 
positive operator such that T = SR. Prove that R = VT*T. 
This exercise shows that if we write T as the product of a unitary operator 
and a positive operator (as in the polar decomposition 7.93), then the 
positive operator equals VT*T. 


Suppose F = C and T € £(V) is normal. Prove that there exists a unitary 
operator S € L(V) such that T = $VT*T and such that S and VT*T both 
have diagonal matrices with respect to the same orthonormal basis of V. 


Suppose that T € Z£(V,W) and T # 0. Let s,,...,s,, denote the positive 
singular values of T. Show that there exists an orthonormal basis e,, ..., €,,, 


of (null T)+ such that 
(ECS 52)) 
ee. 


equals the ball in range T of radius 1 centered at 0. 
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Operators on Complex Vector Spaces 


In this chapter we delve deeper into the structure of operators, with most of the 
attention on complex vector spaces. Some of the results in this chapter apply to 
both real and complex vector spaces; thus we do not make a standing assumption 
that F = C. Also, an inner product does not help with this material, so we return 
to the general setting of a finite-dimensional vector space. 

Even on a finite-dimensional complex vector space, an operator may not have 
enough eigenvectors to form a basis of the vector space. Thus we will consider the 
closely related objects called generalized eigenvectors. We will see that for each 
operator on a finite-dimensional complex vector space, there is a basis of the vector 
space consisting of generalized eigenvectors of the operator. The generalized 
eigenspace decomposition then provides a good description of arbitrary operators 
on a finite-dimensional complex vector space. 

Nilpotent operators, which are operators that when raised to some power 
equal 0, have an important role in these investigations. Nilpotent operators provide 
a key tool in our proof that every invertible operator on a finite-dimensional 
complex vector space has a square root and in our approach to Jordan form. 

This chapter concludes by defining the trace and proving its key properties. 


standing assumptions for this chapter 


e F denotes R or C. 
e V denotes a finite-dimensional nonzero vector space over F. 


ATURE = 


J — om O 
HR ~ Vi 2 
5 


ies) 


The Long Room of the Old Library at the University of Dublin, where William Hamilton 
(1805-1865) was a student and then a faculty member. Hamilton proved a special case 
of what we now call the Cayley—Hamilton theorem in 1853. 
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Null Spaces of Powers of an Operator 


We begin this chapter with a study of null spaces of powers of an operator. 


1 sequence of increasing null spaces 


Suppose T € £(V). Then 


{0} = nullT® C null T! C--- C null T* C null TA*! C..., 


Proof Suppose k is a nonnegative integer and v € nullT* Then T*v = 0, 
which implies that T*+!v = T(T*v) = T(0) = 0. Thus v € null T**4 Hence 
null T* C null T**1 as desired. 


The following result states that if two For similar results about decreasing 
consecutive terms in the sequence of sub- quences of ranges, see Exercises 6, 
spaces above are equal, then all later 7 gnqg. 
terms in the sequence are equal. 


8.2 equality in the sequence of null spaces 
Suppose T € £(V) and m is a nonnegative integer such that 


null T™ = null T”+1 


null T™ = null T+! = null T+? = null T+ = -.- 


Proof Letk be a positive integer. We want to prove that 
null * =n Pee 


We already know from 8.1 that null T”+* C null T™*+*+14 
To prove the inclusion in the other direction, suppose v € null T”***!, Then 


TerlT*s) = Tmtk+ly = 0. 


Hence 
To € null T”*! = null T”™ 


Thus T”**ky = T" (Tv) = 0, which means that v € null T”** This implies that 
null T”***?! C null T”*+* completing the proof. 


The result above raises the question of whether there exists a nonnegative 
integer m such that null T” = null T’”*! The next result shows that this equality 
holds at least when m equals the dimension of the vector space on which T 
operates. 
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8.3 null spaces stop growing 


Suppose T € £(V). Then 


null TdimV = null Taim Wer null TdimV +2 ae 


Proof We only need to prove that null T¢™Y = null T™Y +1 (by 8.2). Suppose 
this is not true. Then, by 8.1 and 8.2, we have 


(O}=nullT? € null T? € aul Te” ¢ null Fee" 2 


where the symbol € means “contained in but not equal to”. At each of the 
strict inclusions in the chain above, the dimension increases by at least 1. Thus 
dim null T¢™’+! > dim V + 1, a contradiction because a subspace of V cannot 
have a larger dimension than dim V. 


It is not true that V = nullT @ rangeT for every T € £(V). However, the 
next result can be a useful substitute. 


8.4. V is the direct sum of nullT“™Y and range T!™Y 


Suppose T € £(V). Then 


V = nullT“™” @ range T?™” 


Proof Letn = dim V. First we show that 
8.5 (null T”) M (range T”) = {0}. 


Suppose v € (nullT”) M (range T"). Then T"v = 0, and there exists u € V 
such that v = T”u. Applying T” to both sides of the last equation shows that 
T"v = T*"u. Hence T?"u = 0, which implies that T’u = 0 (by 8.3). Thus 
v = T"u = 0, completing the proof of 8.5. 

Now 8.5 implies that null T” + range T” is a direct sum (by 1.46). Also, 


dim(nullT” ® range T”) = dimnullT” + dimrange T” = dim V, 


where the first equality above comes from 3.94 and the second equality comes 
from the fundamental theorem of linear maps (3.21). The equation above implies 
that null T” @ range T” = V (see 2.39), as desired. 


For an improvement of the result above, see Exercise 19. 


8.6 example: F° = null T°? @ range T? for T € L(F°) 
Suppose T € L(F°) is defined by 


T (24, Z9,5 Z3) = (4zZ>, 0, 5Z3). 
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Then null T = {(z,,0,0) : z; © F} and range T = {(z,, 0,23) : 21,23 © F}. Thus 
null TM range T # {0}. Hence null T + range T is not a direct sum. Also note that 
null T + range T + F*. However, we have 7? (@.fo5 25) = (0,0, 125z,). Thus we 
see that 


null T? = {(21,2,0) : 2,2. EF} and rangeT® = {(0,0,z5) : z5 € Fh. 
Hence F* = null T? @ range T°, as expected by 8.4. 


Generalized Eigenvectors 


Some operators do not have enough eigenvectors to lead to good descriptions of 
their behavior. Thus in this subsection we introduce the concept of generalized 
eigenvectors, which will play a major role in our description of the structure of an 
operator. 

To understand why we need more than eigenvectors, let’s examine the question 
of describing an operator by decomposing its domain into invariant subspaces. Fix 
T € £(V). We seek to describe T by finding a “nice” direct sum decomposition 


V=V,0-:- @V,,, 


where each V; is a subspace of V invariant under T. The simplest possible nonzero 
invariant subspaces are one-dimensional. A decomposition as above in which 
each V, is a one-dimensional subspace of V invariant under T is possible if and 
only if V has a basis consisting of eigenvectors of T (see 5.55). This happens if 
and only if V has an eigenspace decomposition 


8.7 V = E(A,,T) © @ E\AgsT), 


where A,,...,A,,, are the distinct eigenvalues of T (see 5.55). 

The spectral theorem in the previous chapter shows that if V is an inner product 
space, then a decomposition of the form 8.7 holds for every self-adjoint operator 
if F = R and for every normal operator if F = C because operators of those types 
have enough eigenvectors to form a basis of V (see 7.29 and 7.31). 

However, a decomposition of the form 8.7 may not hold for more general 
operators, even on a complex vector space. An example was given by the operator 
in 5.57, which does not have enough eigenvectors for 8.7 to hold. Generalized 
eigenvectors and generalized eigenspaces, which we now introduce, will remedy 
this situation. 


8.8 definition: generalized eigenvector 


Suppose T € £(V) and A is an eigenvalue of T. A vector v € V is called a 
generalized eigenvector of T corresponding to A if v # 0 and 


(T —AD*v =0 


for some positive integer k. 
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A nonzero vector v € V is a general- Generalized eigenvalues are not de- 
ized eigenvector of T corresponding to A fined because doing so would not lead 
if and only if to anything new. Reason: if (T — Al)‘ 

(T — AD*™Vy = 0, is not injective for ea MO inte- 
ger k, then T — Al is not injective, and 
hence A is an eigenvalue of T. 


as follows from applying 8.1 and 8.3 to 
the operator T — AI. 

As we know, an operator on a complex vector space may not have enough 
eigenvectors to form a basis of the domain. The next result shows that on a 
complex vector space there are enough generalized eigenvectors to do this. 


8.9 a basis of generalized eigenvectors 


Suppose F = C and T € L(V). Then there is a basis of V consisting of 
generalized eigenvectors of T. 


Proof Letn = dimV. We will use induction on n. To get started, note that 
the desired result holds if n = 1 because then every nonzero vector in V is an 
eigenvector of T. 

Now suppose n > 1 and the de- This step is where we use the hypothesis 
sired result holds for all smaller values jpg¢ F = C, because if F = R thenT 
of dim V. Let A be an eigenvalue of T. may not have any eigenvalues. 
Applying 8.4 to T — AI shows that 


V = null(T — AD” @ range(T — AI)”. 


If null(T — AI)” = V, then every nonzero vector in V is a generalized eigen- 
vector of T, and thus in this case there is a basis of V consisting of generalized 
eigenvectors of T. Hence we can assume that null(T — AI)" # V, which implies 
that range(T — AI)” # {0}. 

Also, null(T — AI)” # {0}, because A is an eigenvalue of T. Thus we have 


0 < dimrange(T — AI)” <n. 


Furthermore, range(T — AI)" is invariant under T [by 5.18 with p(z) = (z—A)”]. 
Let S © L(range(T — AI)") equal T restricted to range(T — AI)”. Our induction 
hypothesis applied to the operator S implies that there is a basis of range(T — AI)” 
consisting of generalized eigenvectors of S, which of course are generalized 
eigenvectors of T. Adjoining that basis of range(T—AI)" to a basis of null(T—AI)"” 
gives a basis of V consisting of generalized eigenvectors of T. 


If F = R and dimV > 1, then some operators on V have the property that 
there exists a basis of V consisting of generalized eigenvectors of the operator, 
and (unlike what happens when F = C) other operators do not have this property. 
See Exercise 11 for a necessary and sufficient condition that determines whether 
an operator has this property. 
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8.10 example: generalized eigenvectors of an operator on C? 
Define T € £(C?) by 


T (21, 29,23) = (42,0, 5z3) 


for each (Z,,25,23) € C3 A routine use of the definition of eigenvalue shows that 
the eigenvalues of T are 0 and 5. Furthermore, the eigenvectors corresponding to 
the eigenvalue 0 are the nonzero vectors of the form (z,,0, 0), and the eigenvectors 
corresponding to the eigenvalue 5 are the nonzero vectors of the form (0, 0, 23). 
Hence this operator does not have enough eigenvectors to span its domain C® 

We compute that T°; Zy,Z3) = (0,0, 125z3). Thus 8.1 and 8.3 imply that the 
generalized eigenvectors of T corresponding to the eigenvalue 0 are the nonzero 
vectors of the form (2, Z5,0). 

We also have (T — BI) (ais Z9,23) = (—125z, + 300z,, —125z,,0). Thus the 
generalized eigenvectors of T corresponding to the eigenvalue 5 are the nonzero 
vectors of the form (0, 0, z3). 

The paragraphs above show that each of the standard basis vectors of C? is a 
generalized eigenvector of T. Thus C? indeed has a basis consisting of generalized 
eigenvectors of T, as promised by 8.9. 


If v is an eigenvector of T € L(V), then the corresponding eigenvalue A is 
uniquely determined by the equation Tv = Av, which can be satisfied by only one 
A € F (because v + 0). However, if v is a generalized eigenvector of T, then it 
is not obvious that the equation (T — AI)4'"Vy = 0 can be satisfied by only one 
A & F. Fortunately, the next result tells us that all is well on this issue. 


8.11 generalized eigenvector corresponds to a unique eigenvalue 


Suppose T € £(V). Then each generalized eigenvector of T corresponds to 
only one eigenvalue of T. 


Proof Suppose v € V is a generalized eigenvector of T corresponding to eigen- 
values w and A of T. Let m be the smallest positive integer such that (T—aI)”v = 0. 
Let n = dim V. Then 

0 = (T-Al)"v 
((T —al) + (a—A)I)"0 


ll 


>, bya — AY" “KT — alto, 
k=0 


where by = 1 and the values of the other binomial coefficients b, do not matter. 
Apply the operator (T — aI)'~1 to both sides of the equation above, getting 


0 = (@—A)"(T —al)™~ 1. 


Because (T — al)”~!v + 0, the equation above implies that a = A, as desired. 
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We saw earlier (5.11) that eigenvectors corresponding to distinct eigenvalues 
are linearly independent. Now we prove a similar result for generalized eigen- 
vectors, with a proof that roughly follows the pattern of the proof of that earlier 
result. 


8.12 linearly independent generalized eigenvectors 


Suppose that T € £(V). Then every list of generalized eigenvectors of T 
corresponding to distinct eigenvalues of T is linearly independent. 


Proof Suppose the desired result is false. Then there exists a smallest positive 
integer m such that there exists a linearly dependent list v1, ...,v,,, of generalized 
eigenvectors of T corresponding to distinct eigenvalues A,,..., A,,, of T (note that 
m > 2 because a generalized eigenvector is, by definition, nonzero). Thus there 
exist @1,...,4,, © F, none of which are 0 (because of the minimality of m), such 
that 

Ay0, ++ +40, = 0. 


Let n = dim V. Apply (T — ,,,1)” to both sides of the equation above, getting 
8.13 a,(T —A,,1)"0, + +4, 4 (T — AyD)" 0, 7 = 0. 
Suppose k € {1,...,m™— 1}. Then 
(T — Ag l)"O #0 


because otherwise v; would be a generalized eigenvector of T corresponding to 
the distinct eigenvalues A, and A,,,, which would contradict 8.11. However, 


(T-A,D"((T -—A,,1)"0,) = (T -A,,1"((T - A.D" 0,) = 0. 


Thus the last two displayed equations show that (T — A,,,I)"v, is a generalized 
eigenvector of T corresponding to the eigenvalue A,. Hence 


(T= Aggl) "045 0005 (T — Aggl)"On 4 


is a linearly dependent list (by 8.13) of m—1 generalized eigenvectors correspond- 
ing to distinct eigenvalues, contradicting the minimality of m. This contradiction 
completes the proof. 


Nilpotent Operators 


8.14 definition: nilpotent 


An operator is called nilpotent if some power of it equals 0. 


Thus an operator on V is nilpotent if every nonzero vector in V is a generalized 
eigenvector of T corresponding to the eigenvalue 0. 
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8.15 example: nilpotent operators 
(a) The operator T € L(F*) defined by 
T (21, 29,23,24) = (0,0, 2,, 25) 
is nilpotent because T? = 0. 


(b) The operator on F? whose matrix (with respect to the standard basis) is 


—-3 9 O 
-7 9 6 
4 0 -6 


is nilpotent, as can be shown by cubing the matrix above to get the zero matrix. 


(c) The operator of differentiation on 7,,,(R) is nilpotent because the (m + 1" 
derivative of every polynomial of degree at most m equals 0. Note that on 
this space of dimension m + 1, we need to raise the nilpotent operator to the 
power m + 1 to get the 0 operator. 


The next result shows that when rais- 
ing a nilpotent operator to a power, we 
never need to use a power higher than the pgying power. Thus nilpotent literally 
dimension of the space. For a slightly means having a power that is zero. 
stronger result, see Exercise 18. 


The Latin word nil means nothing or 
zero; the Latin word potens means 


8.16 nilpotent operator raised to dimension of domain is 0 


Suppose T € L(V) is nilpotent. Then T#™Y = 0. 


Proof Because T is nilpotent, there exists a positive integer k such that T* = 0. 
Thus null T* = V. Now 8.1 and 8.3 imply that null T“™Y = V. Thus T¢™Y = 0. 


8.17 eigenvalues of nilpotent operator 


Suppose T € L(V). 


(a) If T is nilpotent, then 0 is an eigenvalue of T and T has no other 
eigenvalues. 


(b) If F = C and 0 is the only eigenvalue of T, then T is nilpotent. 


Proof 


(a) To prove (a), suppose T is nilpotent. Hence there is a positive integer m such 
that T” = 0. This implies that T is not injective. Thus 0 is an eigenvalue 
of T. 
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To show that T has no other eigenvalues, suppose A is an eigenvalue of T. 
Then there exists a nonzero vector v € V such that 


Av = Tv. 
Repeatedly applying T to both sides of this equation shows that 
A" = Tv = 0. 
Thus A = 0, as desired. 


Suppose F = C and 0 is the only eigenvalue of T. By 5.27(b), the minimal 
polynomial of T equals z” for some positive integer m. Thus T’” = 0. Hence 
T is nilpotent. 


(b 


mn 


Exercise 23 shows that the hypothesis that F = C cannot be deleted in (b) of 
the result above. 

Given an operator on V, we want to find a basis of V such that the matrix of 
the operator with respect to this basis is as simple as possible, meaning that the 
matrix contains many 0’s. The next result shows that if T is nilpotent, then we can 
choose a basis of V such that the matrix of T with respect to this basis has more 
than half of its entries equal to 0. Later in this chapter we will do even better. 


8.18 minimal polynomial and upper-triangular matrix of nilpotent operator 


Suppose T € £(V). Then the following are equivalent. 
(a) T is nilpotent. 


(b) The minimal polynomial of T is z” for some positive integer m. 


(c) There is a basis of V with respect to which the matrix of T has the form 


< 


where all entries on and below the diagonal equal 0. 


Proof Suppose (a) holds, so T is nilpotent. Thus there exists a positive integer 
n such that T” = 0. Now 5.29 implies that z” is a polynomial multiple of the 
minimal polynomial of T. Thus the minimal polynomial of T is z” for some 
positive integer m, proving that (a) implies (b). 

Now suppose (b) holds, so the minimal polynomial of T is z” for some positive 
integer m. This implies, by 5.27(a), that 0 (which is the only zero of z”) is the 
only eigenvalue of T. This further implies, by 5.44, that there is a basis of V with 
respect to which the matrix of T is upper triangular. This also implies, by 5.41, 
that all entries on the diagonal of this matrix are 0, proving that (b) implies (c). 

Now suppose (c) holds. Then 5.40 implies that TY = 0. Thus T is nilpotent, 
proving that (c) implies (a). 
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1. Suppose T € L(V). Prove that if dim null T* = 8 and dim null T° = 9, then 
dim null T” = 9 for all integers m > 5. 


2 Suppose T € L(V), mis a positive integer, v € V, and T”~!v # 0 but 
Tv = 0. Prove that v, Tv, Tv, ..., T”~ !v is linearly independent. 
The result in this exercise is used in the proof of 8.45. 
3 Suppose T € £(V). Prove that 
V =nullT @rangeT <= nullT? = nullT. 


4 Suppose T € £(V),A € F, and mis a positive integer such that the minimal 
polynomial of T is a polynomial multiple of (z — A)”. Prove that 


dim null(T — AI)” > m. 
5 Suppose T € £(V) and m is a positive integer. Prove that 


dim null T” < mdim null T. 
Hint: Exercise 21 in Section 3B may be useful. 


6 Suppose T € £(V). Show that 
V = range T° D range T! D --- D range T* D range T**+! D--, 
7 Suppose T € £(V) and m is a nonnegative integer such that 
range T™ = range T™*+ 
Prove that range T* = range T” for all k > m. 


8 Suppose T € £(V). Prove that 
range T2™Y = range TY +1 — range TE™V +2 & ..., 
9 Suppose T € £(V) and m is a nonnegative integer. Prove that 
null T” = nullT”*! <> rangeT™ = rangeT™*+ 


10 Define T € £(C’) by T(w,z) = (z,0). Find all generalized eigenvectors 
of T. 


11 Suppose that T € L(V). Prove that there is a basis of V consisting of 
generalized eigenvectors of T if and only if the minimal polynomial of T 
equals (z — A,)---(z — A,,) for some Aq, ..., A, € F. 

Assume F = R because the case F = C follows from 5.27(b) and 8.9. 

This exercise states that the condition for there to be a basis of V consisting 
of generalized eigenvectors of T is the same as the condition for there to be 
a basis with respect to which T has an upper-triangular matrix (see 5.44). 
Caution: If T has an upper-triangular matrix with respect to a basis 
V4, +++ 0, Of V, then v, is an eigenvector of T but it is not necessarily true 
that V9, ...,0, are generalized eigenvectors of T. 


12 


13 
14 
15 


16 


17 


18 


19 


20 


21 


22 


23 


24 


25 


Section 8A Generalized Eigenvectors and Nilpotent Operators 307 


Suppose T € £(V) is such that every vector in V is a generalized eigenvector 
of T. Prove that there exists A € F such that T — AJ is nilpotent. 


Suppose S,T € £(V) and ST is nilpotent. Prove that TS is nilpotent. 
Suppose T € £(V) is nilpotent and T # 0. Prove T is not diagonalizable. 


Suppose F = C and T € £(V). Prove that T is diagonalizable if and only if 
every generalized eigenvector of T is an eigenvector of T. 


For F = C, this exercise adds another equivalence to the list of conditions 
for diagonalizability in 5.55. 


(a) Give an example of nilpotent operators S, T on the same vector space 
such that neither S + T nor ST is nilpotent. 

(b) Suppose S,T € £(V) are nilpotent and ST = TS. Prove that S + T and 
ST are nilpotent. 


Suppose T € £(V) is nilpotent and m is a positive integer such that T” = 0. 


(a) Prove that I — T is invertible and that (J — T)-! =1+T+--+T7"71 
(b) Explain how you would guess the formula above. 


Suppose T € L(V) is nilpotent. Prove that T!+4™tangeT — Q, 


If dimrange T < dim V — 1, then this exercise improves 8.16. 


Suppose T € £(V) is not nilpotent. Show that 


V = null T#™Y-1 @ range TAY -1 


For operators that are not nilpotent, this exercise improves 8.4. 


Suppose V is an inner product space and T € £(V) is normal and nilpotent. 
Prove that T = 0. 


Suppose T € L(V) is such that null T¢™”-1 4 null T%™Y, Prove that T is 
nilpotent and that dim null Tk =k for every integer k withO <k < dim V. 


Suppose T € L(C°) is such that rangeT* # rangeT°. Prove that T is 
nilpotent. 


Give an example of an operator T on a finite-dimensional real vector space 
such that 0 is the only eigenvalue of T but T is not nilpotent. 
This exercise shows that the implication (b) ==» (a) in 8.17 does not hold 
without the hypothesis that F = C. 


For each item in Example 8.15, find a basis of the domain vector space such 
that the matrix of the nilpotent operator with respect to that basis has the 
upper-triangular form promised by 8.18(c). 


Suppose that V is an inner product space and T € £(V) is nilpotent. Show 
that there is an orthonormal basis of V with respect to which the matrix of T 
has the upper-triangular form promised by 8.18(c). 
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Generalized Eigenspaces 


8.19 definition: generalized eigenspace, G(A,T) 


Suppose T € £(V) and A € F. The generalized eigenspace of T correspond- 
ing to A, denoted by G(A, T), is defined by 


G(A,T) = {0 EV: (T — Al)*v = 0 for some positive integer k}. 


Thus G(A, T) is the set of generalized eigenvectors of T corresponding to A, 
along with the 0 vector. 


Because every eigenvector of T is a generalized eigenvector of T (take k = 1 
in the definition of generalized eigenvector), each eigenspace is contained in the 
corresponding generalized eigenspace. In other words, if T € £(V) and A EF, 
then E(A,T) C G(A,T). 

The next result implies that if T € Z(V) and A € F, then the generalized 
eigenspace G(A, T) is a subspace of V (because the null space of each linear map 
on V is a subspace of V). 


8.20 description of generalized eigenspaces 


Suppose T € L(V) and A € F. Then G(A,T) = null(T — Ale™”. 


Proof Suppose v € null(T — AI)“™Y. The definitions imply v € G(A, T). Thus 
G(A,T) D null(T — ADA, 

Conversely, suppose v € G(A,T). Thus there is a positive integer k such 
that v € null(T — AI)K From 8.1 and 8.3 (with T — AI replacing T), we get 
v € null(T — Al)“™Y, Thus G(A, T) C null(T — AI)“™Y, completing the proof. 


8.21 example: generalized eigenspaces of an operator on C? 


Define T € £(C?) by 
T (24,29, 23) = (4zZ>, 0, 5z3). 


In Example 8.10, we saw that the eigenvalues of T are 0 and 5, and we found 
the corresponding sets of generalized eigenvectors. Taking the union of those sets 
with {0}, we have 


G(0,T) = {(Z1, 2,0) 24,2, EC} and G(5,T) = {(0,0,z3) : z3 € C}. 
Note that C? = G(0, T) @ G(5,T). 
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In Example 8.21, the domain space C? is the direct sum of the generalized 
eigenspaces of the operator T in that example. Our next result shows that this 
behavior holds in general. Specifically, the following major result shows that if 
F =CandT € £(V), then V is the direct sum of the generalized eigenspaces 
of T, each of which is invariant under T and on which T is a nilpotent operator 
plus a scalar multiple of the identity. Thus the next result achieves our goal of 
decomposing V into invariant subspaces on which T has a known behavior. 

As we will see, the proof follows from putting together what we have learned 
about generalized eigenspaces and then using our result that for each operator 
T € L(V), there exists a basis of V consisting of generalized eigenvectors of T. 


8.22 generalized eigenspace decomposition 


Suppose F = CandT € Z(V). Let Aj,..., A, be the distinct eigenvalues 
of T. Then 


(a) G(A,, T) is invariant under T for each k = 1,..., m; 


(b) (T-A,Dlea,,7 is nilpotent for each k = 1,...,m; 
(c) V = G(Aq,T) @ + ® G(A,,, T). 


Proof 
(a) Suppose k € {1,...,m}. Then 8.20 shows that 
GA, T) = aull(T = A.D. 
Thus 5.18, with p(z) = eaapt implies that G(A,, T) is invariant under T, 
proving (a). 


Suppose k € {1,...,m}. Ifv € G(A,, T), then (T — Aes = 0 (by 8.20). 
Thus ((T — Dieses = 0. Hence (T — A,D)igcq,,7) is nilpotent, 
proving (b). 


(b 


wm 


(c) To show that G(A,,T) +--- + G(A,,, T) is a direct sum, suppose 
VU, ++++0,, = 0, 


where each v, is in G(A,,T). Because generalized eigenvectors of T cor- 
responding to distinct eigenvalues are linearly independent (by 8.12), this 
implies that each v, equals 0. Thus G(A,, T) + --- + G(A,,, T) is a direct sum 
(by 1.45). 


Finally, each vector in V can be written as a finite sum of generalized eigen- 
vectors of T (by 8.9). Thus 


proving (c). 


For the analogous result when F = R, see Exercise 8. 


310 Chapter 8 Operators on Complex Vector Spaces 


Multiplicity of an Eigenvalue 


If V is a complex vector space and T € £(V), then the decomposition of V pro- 
vided by the generalized eigenspace decomposition (8.22) can be a powerful tool. 
The dimensions of the subspaces involved in this decomposition are sufficiently 
important to get a name, which is given in the next definition. 


8.23 definition: multiplicity 


e Suppose T € £(V). The multiplicity of an eigenvalue A of T is defined to 
be the dimension of the corresponding generalized eigenspace G(A, T). 


e In other words, the multiplicity of an eigenvalue A of T equals 


dimmu? ae 


The second bullet point above holds because G(A, T) = null(T — Al )dimV (see 
8.20). 


8.24 example: multiplicity of each eigenvalue of an operator 


Suppose T € £(C?) is defined by 


The matrix of T (with respect to the standard basis) is 


6 3 4 
0 6 2 |. 
0 0 7 


The eigenvalues of T are the diagonal entries 6 and 7, as follows from 5.41. You 
can verify that the generalized eigenspaces of T are as follows: 


G(6,T) = span((1,0,0), (0,1,0)) and G(7,T) = span((10,2,1)). 


Thus the eigenvalue 6 has multiplicity 2p, pis ox ample, the multiplicity of each 
and the eigenvalue 7 has multiplicity 1. eigenvalue equals the number of times 
The direct sum C* = G(6,T) ® G(7,T) that eigenvalue appears on the diago- 
is the generalized eigenspace decom- _ pa/ of an upper-triangular matrix rep- 
position promised by 8.22. A basis resenting the operator. This behavior 
of C? consisting of generalized eigen- always happens, as we will see in 8.31. 
vectors of T, as promised by 8.9, is 

(1,0, 0), (0, 1,0), (10, 2,1). There does not exist a basis of C® consisting of eigen- 
vectors of this operator. 


In the example above, the sum of the multiplicities of the eigenvalues of T 
equals 3, which is the dimension of the domain of T. The next result shows that 
this holds for all operators on finite-dimensional complex vector spaces. 
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8.25 sum of the multiplicities equals dim V 


Suppose F = C and T € £(V). Then the sum of the multiplicities of all 
eigenvalues of T equals dim V. 


Proof The desired result follows from the generalized eigenspace decomposition 
(8.22) and the formula for the dimension of a direct sum (see 3.94). 


The terms algebraic multiplicity and geometric multiplicity are used in some 
books. In case you encounter this terminology, be aware that the algebraic multi- 
plicity is the same as the multiplicity defined here and the geometric multiplicity 
is the dimension of the corresponding eigenspace. In other words, if T € £(V) 
and A is an eigenvalue of T, then 


algebraic multiplicity of A = dim null(T — AI ydimV — dim G(A,T), 
geometric multiplicity of A = dimnull(T — AI) = dim E(A,T). 


Note that as defined above, the algebraic multiplicity also has a geometric meaning 
as the dimension of a certain null space. The definition of multiplicity given here 
is cleaner than the traditional definition that involves determinants; 9.62 implies 
that these definitions are equivalent. 

If V is an inner product space, T € £(V) is normal, and A is an eigenvalue 
of T, then the algebraic multiplicity of A equals the geometric multiplicity of A, 
as can be seen from applying Exercise 27 in Section 7A to the normal operator 
T — AI. As a special case, the singular values of S € Z(V, W) (here V and W are 
both finite-dimensional inner product spaces) depend on the multiplicities (either 
algebraic or geometric) of the eigenvalues of the self-adjoint operator S*S. 

The next definition associates a monic polynomial with each operator on a 
finite-dimensional complex vector space. 


8.26 definition: characteristic polynomial 


Suppose F = CandT € Z£(V). Let Aq, ...,A,,, denote the distinct eigenvalues 
of T, with multiplicities d,,...,d,,,. The polynomial 


(Z — Ay) (Z = Ags) 


is called the characteristic polynomial of T. 


8.27 example: the characteristic polynomial of an operator 


Suppose T € L(C?) is defined as in Example 8.24. Because the eigenvalues of 
T are 6, with multiplicity 2, and 7, with multiplicity 1, we see that the characteristic 
polynomial of T is (z — 6)?(z — 7). 
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8.28 degree and zeros of characteristic polynomial 


Suppose F = Cand T € £(V). Then 


(a) the characteristic polynomial of T has degree dim V; 


(b) the zeros of the characteristic polynomial of T are the eigenvalues of T. 


Proof Our result about the sum of the multiplicities (8.25) implies (a). The 
definition of the characteristic polynomial implies (b). 


Most texts define the characteristic polynomial using determinants (the two 
definitions are equivalent by 9.62). The approach taken here, which is considerably 


simpler, leads to the following nice proof of the Cayley—Hamilton theorem. 


8.29 Cayley—Hamilton theorem 


Suppose F = C, T € L(V), and q is the characteristic polynomial of T. Then 
q(T) = 0. 


Proof Let A,,...,A,, be the distinct eigenvalues of T, and let d, = dim G(A,, T). 
For each k € {1,..., m}, we know that (T — MDiga.et is nilpotent. Thus we have 


(T= Al) leva 7) =0 Arthur Cayley (1821-1895) published 
7 three mathematics papers before com- 


(by 8.16) for each k € {1,..., m1}. pleting his undergraduate degree. 


The generalized eigenspace decom- 
position (8.22) states that every vector in V is a sum of vectors in 
G(A,,T),..., G(A,,,T). Thus to prove that q(T) = 0, we only need to show 
that q(T) |G(,,7) = 9 for each k. 

Fix k € {1,...,m}. We have 


q(T) -(T- A, D%-(T _ Ail) an. 


The operators on the right side of the equation above all commute, so we can 
move the factor (T — A;,I)“* to be the last term in the expression on the right. 
Because (T — AD)"*\eca,.7) = 0, we have q(T )lea,.7) = 0, as desired. 


The next result implies that if the minimal polynomial of an operator T € £(V) 
has degree dim V (as happens almost always—see the paragraphs following 5.24), 
then the characteristic polynomial of T equals the minimal polynomial of T. 


8.30 characteristic polynomial is a multiple of minimal polynomial 


Suppose F = C and T € L(V). Then the characteristic polynomial of T is a 
polynomial multiple of the minimal polynomial of T. 


Proof The desired result follows immediately from the Cayley—Hamilton theo- 
rem (8.29) and 5.29. 
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Now we can prove that the result suggested by Example 8.24 holds for all 
operators on finite-dimensional complex vector spaces. 


8.31 multiplicity of an eigenvalue equals number of times on diagonal 


Suppose F = C andT € L(V). Suppose 7, ...,v,, is a basis of V such that 


M(T, (Vj, ..-.U,)) is upper triangular. Then the number of times that each 
eigenvalue A of T appears on the diagonal of M(T, (v,,...,V,,)) equals the 
multiplicity of A as an eigenvalue of T. 


Proof LetA = M(T, (v,,...,0,)). Thus A is an upper-triangular matrix. Let 
A,,.--,A,, denote the entries on the diagonal of A. Thus for each k € {1,..., 7}, 
we have 

TO, = Ug + ApDy 


for some u, € span(vy,...,0,_1). Hence ifk € {1,...,1} and A, # 0, then Tv, is 
not a linear combination of Tv, ..., Tvu,_1. The linear dependence lemma (2.19) 
now implies that the list of those Tv, such that A, # 0 is linearly independent. 

Let d denote the number of indices k € {1,...,1} such that A, = 0. The 
conclusion of the previous paragraph implies that 


dimrange T > n — d. 
Because n = dim V = dimnull T + dimrange T, the inequality above implies that 
8.32 dim null T < d. 


The matrix of the operator T” with respect to the basis v,,...,v,, is the upper- 
triangular matrix A”, which has diagonal entries A,’,..., A,’ [see Exercise 2(b) in 
Section 5C]. Because A,’ = 0 if and only if A, = 0, the number of times that 0 
appears on the diagonal of A” equals d. Thus applying 8.32 with T replaced with 
T”, we have 


8.33 dim null T” < d. 


For A an eigenvalue of T, let m, denote the multiplicity of A as an eigenvalue 
of T and let d, denote the number of times that A appears on the diagonal of A. 
Replacing T in 8.33 with T — AI, we see that 


8.34 m, <d), 


for each eigenvalue A of T. The sum of the multiplicities m, over all eigenvalues 
A of T equals n, the dimension of V (by 8.25). The sum of the numbers d, over 
all eigenvalues A of T also equals n, because the diagonal of A has length n. 

Thus summing both sides of 8.34 over all eigenvalues A of T produces an 
equality. Hence 8.34 must actually be an equality for each eigenvalue A of T. 
Thus the multiplicity of A as an eigenvalue of T equals the number of times that 
A appears on the diagonal of A, as desired. 
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Block Diagonal Matrices 


To interpret our results in matrix form, Ofc e canna oranda 
we make the following definition, gener- poser by thinking of it as composed 
alizing the notion of a diagonal matrix. — of smaller matrices. 

If each matrix A, in the definition below 

is a 1-by-1 matrix, then we actually have a diagonal matrix. 


8.35 definition: block diagonal matrix 


A block diagonal matrix is a square matrix of the form 


i . | 
0 ae 


where A,,...,A,,, are square matrices lying along the diagonal and all other 
entries of the matrix equal 0. 


8.36 example: a block diagonal matrix 


The 5-by-5 matrix 


(4 ) 0 0 0 0 
0 2 -3 0 
A= 0 ( 0 2 0 
0 0 0 
0 0 0 ( 0 1 
is a block diagonal matrix with 
Ay 0 
A= Ay , 
0 A3 


where 


A,=(4), A=() rs) As=(4 ia 


Here the inner matrices in the 5-by-5 matrix above are blocked off to show how 
we can think of it as a block diagonal matrix. 


Note that in the example above, each of A,, A, A3 is an upper-triangular 
matrix whose diagonal entries are all equal. The next result shows that with 
respect to an appropriate basis, every operator on a finite-dimensional complex 
vector space has a matrix of this form. Note that this result gives us many more 
zeros in the matrix than are needed to make it upper triangular. 
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8.37 block diagonal matrix with upper-triangular blocks 


Suppose F = CandT € Z(V). Let Aj,..., A, be the distinct eigenvalues 
of T, with multiplicities d,, ...,d,,,. Then there is a basis of V with respect to 
which T has a block diagonal matrix of the form 


. . | 
0 A 


where each A, is a d,-by-d, upper-triangular matrix of the form 


Xx 
re | 
0 
Proof Each (T — A,1)|ecq,,1) is nilpotent (see 8.22). For each k, choose a basis 
of G(A,, T), which is a vector space of dimension d,, such that the matrix of 
(T — A,Dlea,,7) With respect to this basis is as in 8.18(c). Thus with respect to 
this basis, the matrix of T|¢(,,,7), which equals (T — A.D lea, + Adlca, 17> 
looks like the desired form shown above for A,. 
The generalized eigenspace decomposition (8.22) shows that putting together 


the bases of the G(A;,, T)’s chosen above gives a basis of V. The matrix of T with 
respect to this basis has the desired form. 


8.38 example: block diagonal matrix via generalized eigenvectors 


Let T € L(C?) be defined by T (21,295.23) = (621 + 3Zy + 423, 6Zy + 223, 7Z3). 
The matrix of T (with respect to the standard basis) is 


6 3 4 
0 6 2 |, 
0 0 7 


which is an upper-triangular matrix but is not of the form promised by 8.37. 
As we saw in Example 8.24, the eigenvalues of T are 6 and 7, and 


G(6,T) = span((1,0,0), (0,1,0)) and G(7,T) = span((10,2,1)). 


We also saw that a basis of C? consisting of generalized eigenvectors of T is 
(1, 0,0), (0, 1, 0), (10, 2, 1). 


The matrix of T with respect to this basis is 


[(o «) } 
00 (7) 


which is a matrix of the block diagonal form promised by 8.37. 


316 


Chapter 8 Operators on Complex Vector Spaces 


Exercises 8B 


10 


11 


Define T € £(C’) by T(w,z) = (—z, w). Find the generalized eigenspaces 
corresponding to the distinct eigenvalues of T. 


Suppose T € L(V) is invertible. Prove that G(A, T) = G(s. | for every 
A € F with A #0. 


Suppose T € £(V). Suppose S € L(V) is invertible. Prove that T and 
S-'TS have the same eigenvalues with the same multiplicities. 


Suppose dim V > 2andT € L(V) is such that null T#™Y -? 4 null TA™Y -1 
Prove that T has at most two distinct eigenvalues. 


Suppose T € £(V) and 3 and 8 are eigenvalues of T. Let n = dim V. Prove 
that V = (nullT"~*) ® (range T”~7). 


Suppose T € £(V) and A is an eigenvalue of T. Explain why the exponent 
of z — A in the factorization of the minimal polynomial of T is the smallest 
positive integer m such that (T — AI)"|¢(,,7) = 0. 


Suppose T € £(V) and A is an eigenvalue of T with multiplicity d. Prove 
that G(A, T) = null(T — Al)4 
Ifd < dim V, then this exercise improves 8.20. 


Suppose T € £(V) and A,,..., A, are the distinct eigenvalues of T. Prove 
that 

V=G(A,,T) 6: 8 GA,,, T) 
if and only if the minimal polynomial of T equals (z — A,)"---(z — A,,) km 
for some positive integers ky, ..., k,,.. 


The case F = C follows immediately from 5.27(b) and the generalized 
eigenspace decomposition (8.22), thus this exercise is interesting only when 
F=R. 


Suppose F = C andT € L(V). Prove that there exist D,.N € L(V) 
such that T = D + N, the operator D is diagonalizable, N is nilpotent, and 
DN = ND. 


Suppose V is a complex inner product space, ¢,,...,¢,, is an orthonormal 


basis of T, and T € £(V). Let Aj,...,A,, be the eigenvalues of T, each 
included as many times as its multiplicity. Prove that 


2 2 2 2 
[Agh sb [Ag Ss Legh ee Te 
See the comment after Exercise 5 in Section 7A. 


Give an example of an operator on C* whose characteristic polynomial 
equals (z — 7)?(z — 8)”. 


12 


13 


14 


15 


16 


17 


18 


19 


20 


21 


Section 8B Generalized Eigenspace Decomposition 317 
Give an example of an operator on C* whose characteristic polynomial 
equals (z —1)(z—5)? and whose minimal polynomial equals (z — 1) (z—5)?. 


Give an example of an operator on C* whose characteristic and minimal 
polynomials both equal z(z — 1)?(z — 3). 


Give an example of an operator on C* whose characteristic polynomial equals 
z(z — 1)*(z — 3) and whose minimal polynomial equals z(z — 1)(z — 3). 


Let T be the operator on C+ defined by T(z,,Z>, 23,24) = (0,21, 2,23). Find 
the characteristic polynomial and the minimal polynomial of T. 


Let T be the operator on C° defined by 
T(Z4, 295 235 ZA> Z55 Z6) = (0, Z1> 295 0, Z4> 0). 
Find the characteristic polynomial and the minimal polynomial of T. 


Suppose F = Cand P € L(V) is such that P? = P. Prove that the characteris- 
tic polynomial of P is z”(z—1)", where m = dim null P andn = dimrange P. 


Suppose T € £(V) and A is an eigenvalue of T. Explain why the following 

four numbers equal each other. 

(a) The exponent of z — A in the factorization of the minimal polynomial 
of T. 

(b) The smallest positive integer m such that (T — AI)"|¢,,,7) = 0. 

(c) The smallest positive integer m such that 


null(T — Al)” = null(T — AD™t4 
(d) The smallest positive integer m such that 
range(T — AI)” = range(T — AI)™*1. 


Suppose F = C and S € L(V) is a unitary operator. Prove that the constant 
term in the characteristic polynomial of S has absolute value 1. 


Suppose that F = C and Vj,..., V,,, are nonzero subspaces of V such that 
V=V,@-- @V,,. 


Suppose T € L(V) and each V, is invariant under T. For each k, let p, 
denote the characteristic polynomial of T|y,. Prove that the characteristic 
polynomial of T equals p,---p,,,- 


Suppose p,q € P(C) are monic polynomials with the same zeros and q is a 

polynomial multiple of p. Prove that there exists T € £(C*1) such that 

the characteristic polynomial of T is q and the minimal polynomial of T is p. 
This exercise implies that every monic polynomial is the characteristic 
polynomial of some operator. 
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22 Suppose A and B are block diagonal matrices of the form 


23 


A; 0 B, 0 
A= a , B= “, 
0 Bs, 0 By, 


where A, and B, are square matrices of the same size for each k = 1,...,m. 
Show that AB is a block diagonal matrix of the form 


AB = i 
0 An Bun 


Suppose F = R, T € £(V), andA EC. 


(a) 
(b) 


(c) 


(d) 


Show that u + iv € G(A, Tc) if and only if u — iv € G(A, Te). 

Show that the multiplicity of A as an eigenvalue of Tc; equals the 
multiplicity of A as an eigenvalue of Te. 

Use (b) and the result about the sum of the multiplicities (8.25) to show 
that if dim V is an odd number, then T¢ has a real eigenvalue. 

Use (c) and the result about real eigenvalues of Tc. (Exercise 17 in 
Section 5A) to show that if dim V is an odd number, then T has an 
eigenvalue (thus giving an alternative proof of 5.34). 


See Exercise 33 in Section 3B for the definition of the complexification Te. 
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8C Consequences of Generalized Eigenspace Decomposition 


Square Roots of Operators 


Recall that a square root of an operator T € £(V) is an operator R € £(V) such 
that R* = T (see 7.36). Every complex number has a square root, but not every 
operator on a complex vector space has a square root. For example, the operator 
on C° defined by T (21, 22,23) = (Z,Z3,0) does not have a square root, as you are 
asked to show in Exercise |. The noninvertibility of that operator is no accident, 
as we will soon see. We begin by showing that the identity plus any nilpotent 
operator has a square root. 


8.39 identity plus nilpotent has a square root 


Suppose T € L(V) is nilpotent. Then J + T has a square root. 


Proof Consider the Taylor series for the function V1 + x: 


8.40 V1 4+x=144,x% + ayx7 ++. 


We do not find an explicit formula for 
the coefficients or worry about whether 
the infinite sum converges because we 
use this equation only as motivation. 

Because T is nilpotent, T’” = 0 for 
some positive integer m. In 8.40, suppose we replace x with T and 1 with I. Then 
the infinite sum on the right side becomes a finite sum (because T* = 0 for all 
k > m). Thus we guess that there is a square root of I + T of the form 


Because a, = 5 the formula above 
implies that 1 + 5 is a good estimate 


for V1 +x when x is small. 


1+ ayT + QoT? ++ + Qy 4 T" 71 


Having made this guess, we can try to choose ay, a, ..., @,,, _; such that the operator 
above has its square equal to 1+ T. Now 


(eal +6 ae Se, Or 


+ (24,1 + terms involving 4, ...,4,9)T”~ + 


We want the right side of the equation above to equal I + T. Hence choose a, 
such that 2a, = 1 (thus a; = 1/2). Next, choose a, such that 2a, + a? = 0 (thus 
fy = —1/8). Then choose a, such that the coefficient of T? on the right side of 
the equation above equals 0 (thus a; = 1/16). Continue in this fashion for each 
k =4,...,m—1, at each step solving for a, so that the coefficient of T* on the right 
side of the equation above equals 0. Actually we do not care about the explicit 
formula for the a;’s. We only need to know that some choice of the a,’s gives a 
square root of I + T. 
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The previous lemma is valid on real and complex vector spaces. However, the 
result below holds only on complex vector spaces. For example, the operator of 
multiplication by —1 on the one-dimensional real vector space R has no square 
root. 

For the proof below, we need to know that every z € C has Z 
a square root in C. To show this, write 


z=r(cos6+ isin 86), y 


where r is the length of the line segment in the complex plane 
from the origin to z and @ is the angle of that line segment with 6 
the positive horizontal axis. Then 


Representation 
VF (cos 3 + isin 5) of a complex 
number with 
is a square root of z, as you can verify by showing that the square polar 


of the complex number above equals z. coordinates. 


8.41 over C, invertible operators have square roots 


Suppose V is a complex vector space and T € £(V) is invertible. Then T has 
a square root. 


Proof Let A4,...,A,, be the distinct eigenvalues of T. For each k, there exists a 
nilpotent operator T, € £(G(A,,T)) such that T|g¢,, 7) = Axl + T, [See 8.22(c)]. 
Because T is invertible, none of the A,’s equals 0, so we can write 


T, 
Te(apT) = Ax I + re 


for each k. Because T,/A, is nilpotent, I + T,/A, has a square root (by 8.39). 
Multiplying a square root of the complex number A, by a square root of I+ T,./A,, 
we obtain a square root R; of Tlg(,,,1r)- 

By the generalized eigenspace decomposition (8.22), a typical vector v € V 
can be written uniquely in the form 


V=Uyte tu 


where each u, is in G(A,,T). Using this decomposition, define an operator 
Re L(V) by 
Ro = Ryuy t+ + Ry Un 


You should verify that this operator R is a square root of T, completing the proof. 
By imitating the techniques in this subsection, you should be able to prove that 


if V is a complex vector space and T € L(V) is invertible, then T has a k™ root 
for every positive integer k. 
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Jordan Form 


We know that if V is a complex vector space, then for every T € £(V) there is a 
basis of V with respect to which T has a nice upper-triangular matrix (see 8.37). 
In this subsection we will see that we can do even better—there is a basis of V 
with respect to which the matrix of T contains 0’s everywhere except possibly on 
the diagonal and the line directly above the diagonal. 

We begin by looking at two examples of nilpotent operators. 


8.42 example: nilpotent operator with nice matrix 


Let T be the operator on C* defined by 
T (24, 295 235 Z4) = (0, 215295 Z3). 


Then T* = 0; thus T is nilpotent. If v = (1,0, 0, 0), then T°v, Tz, Tv, v is a basis 
of C* The matrix of T with respect to this basis is 


010 0 
001 0 
000 1 
00 0 0 


The next example of a nilpotent operator has more complicated behavior than 
the example above. 


8.43 example: nilpotent operator with slightly more complicated matrix 


Let T be the operator on C° defined by 
T(Z, Z9, 235 Z45 Z55 26) _— (0, Z1> Z9,5 0, Z4> 0). 


Then T° = 0; thus T is nilpotent. In contraast to the nice behavior of the nilpotent 
operator of the previous example, for this nilpotent operator there does not exist 
a vector v € C® such that T°v, T*v, Tv, T2v, Tv, v is a basis of C® However, if 
we take v, = (1,0,0,0,0,0), v2 = (0,0,0,1,0,0), and v, = (0,0,0,0,0,1), then 
T?01, TV, 01, Tp, V2, 03 is a basis of C° The matrix of T with respect to this 
basis is 


010 0 0 0 
foo | 0 0 0 
0 0 0 0 0 0 
0 0 0 01 0 
00 0 faa) 
00 0 00 (0) 


Here the inner matrices are blocked off to show that we can think of the 6-by-6 
matrix above as a block diagonal matrix consisting of a 3-by-3 block with 1’s on 
the line above the diagonal and 0’s elsewhere, a 2-by-2 block with 1 above the 
diagonal and 0’s elsewhere, and a 1-by-1 block containing 0. 
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Our next goal is to show that every nilpotent operator T € £(V) behaves 
similarly to the operator in the previous example. Specifically, there is a finite 
collection of vectors v,,...,v,, € V such that there is a basis of V consisting of 
the vectors of the form T’v;, as k varies from 1 to n and j varies (in reverse order) 
from 0 to the largest nonnegative integer m, such that T”*v, # 0. With respect to 
this basis, the matrix of T looks like the matrix in the previous example. More 
specifically, T has a block diagonal matrix with respect to this basis, with each 
block a square matrix that is 0 everywhere except on the line above the diagonal. 

In the next definition, the diagonal of each A, is filled with some eigenvalue 
A, of T, the line directly above the diagonal of A, is filled with 1’s, and all other 
entries in A, are 0 (to understand why each A, is an eigenvalue of T, see 5.41). 
The A,’s need not be distinct. Also, A, may be a 1-by-1 matrix (A,) containing 
just an eigenvalue of T. If each A, is 0, then the next definition captures the 
behavior described in the paragraph above (recall that if T is nilpotent, then 0 is 
the only eigenvalue of T). 


8.44 definition: Jordan basis 


Suppose T € £(V). A basis of V is called a Jordan basis for T if with respect 
to this basis T has a block diagonal matrix 


le | 
0 A, 


in which each A, is an upper-triangular matrix of the form 
il 
Ax = i 


Most of the work in proving that every operator on a finite-dimensional com- 
plex vector space has a Jordan basis occurs in proving the special case below 
of nilpotent operators. This special case holds on real vector spaces as well as 
complex vector spaces. 


Proof We will prove this result by induction on dim V. To get started, note that 
the desired result holds if dim V = 1 (because in that case, the only nilpotent 
operator is the 0 operator). Now assume that dim V > 1 and that the desired result 
holds on all vector spaces of smaller dimension. 
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Let m be the smallest positive integer such that T’” = 0. Thus there exists 

u € V such that T”~!u + 0. Let 
U = span(u, Tu, ..., 7-14). 

The list uv, Tu, ..., T’”~!u is linearly independent (see Exercise 2 in Section 8A). 
If U = V, then writing this list in reverse order gives a Jordan basis for T and we 
are done. Thus we can assume that U + V. 

Note that U is invariant under T. By our induction hypothesis, there is a basis 
of U that is a Jordan basis for T|,;. The strategy of our proof is that we will find a 
subspace W of V such that W is also invariant under T and V = U @ W. Again 
by our induction hypothesis, there will be a basis of W that is a Jordan basis for 
Tlyw. Putting together the Jordan bases for T|,; and T|y, we will have a Jordan 


basis for T. 
Let g € V’ be such that p(T”~‘u) # 0. Let 


W= {o € V: 9(T*v) = 0 for eachk = 0,...,m — 1}. 


Then W is a subspace of V that is invariant under T (the invariance holds because 
if v © W then p(T*(To)) = 0 for k = 0,...,m — 1, where the case k = m—1 
holds because T” = 0). We will show that V = U @ W, which by the previous 
paragraph will complete the proof. 
To show that U + W is a direct sum, suppose v € UM W with v # 0. Because 
v € U, there exist co, ...,C,,, 1 © F such that 
V= Col +CyTUt + + Cy 47 tu. 


Let j be the smallest index such that c; # 0. Apply T”~/—! to both sides of the 
equation above, getting 
fia hale haa 


where we have used the equation T” = 0. Now apply ¢ to both sides of the 
equation above, getting 


p(T" J~*v) = ¢(T™—"n) #0. 


The equation above shows that v € W. Hence we have proved that UM W = {0}, 
which implies that U + W is a direct sum (see 1.46). 
To show that U @ W = V, define S$: V > F” by 


Sv = (9(0), p(Td), ... (T™~10)). 
Thus null S = W. Hence 
dim W = dimnullS = dim V — dimrange S > dim V — m, 


where the second equality comes from the fundamental theorem of linear maps 
(3.21). Using the inequality above, we have 


dim(U @ W) = dimU + dimW > m + (dim V — m) = dim V. 
Thus U @ W = V (by 2.39), completing the proof. 
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Now the generalized eigenspace de-  Cymitie Jordan (1838-1922) pub- 
composition allows us to extend the pre- jjspeqq proof of 8.46 in 1870. 


vious result to operators that may not be 
nilpotent. Doing this requires that we deal with complex vector spaces. 


Proof Let Aj,...,A,,, be the distinct eigenvalues of T. The generalized eigenspace 
decomposition states that 


V =G(Aq,T) © + ® GAs T), 


where each (T — Aj1)|¢,,,,7) iS nilpotent (see 8.22). Thus 8.45 implies that some 
basis of each G(A;,, T) is a Jordan basis for (T — A,I)|gq,,7). Put these bases 
together to get a basis of V that is a Jordan basis for T. 


Exercises 8C 


1 Suppose T € L(C°) is the operator defined by T(21,29,23) = (Z9,2Z3,0). 
Prove that T does not have a square root. 


2 Define T € L(F°) by T(x1, Xp, X53, X4,Xs5) = (2Xy, 3X3, —X4, 4X5, 0). 


(a) Show that T is nilpotent. 
(b) Find a square root of I + T. 


3 Suppose V is a complex vector space. Prove that every invertible operator 
on V has a cube root. 


4 Suppose V is a real vector space. Prove that the operator —I on V has a 
square root if and only if dim V is an even number. 


5 Suppose T € L(C7*) is the operator defined by T(w,z) = (-w—z,9w+5z). 
Find a Jordan basis for T. 


6 Finda basis of ?,(R) that is a Jordan basis for the differentiation operator 
D on P,(R) defined by Dp = p’ 


7 Suppose T € L(V) is nilpotent and v,...,v,, is a Jordan basis for T. Prove 
that the minimal polynomial of T is z’”*!, where m is the length of the 
longest consecutive string of 1’s that appears on the line directly above the 


diagonal in the matrix of T with respect to v1, ...,v,. 


8 Suppose T € L(V) and 7,..., v, is a basis of V that is a Jordan basis for T. 
Describe the matrix of T* with respect to this basis. 
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9 


10 


11 


12 


13 


14 


Suppose T € £(V) is nilpotent. Explain why there exist v,,...,v,, € V and 
nonnegative integers m,, ...,7m,, such that (a) and (b) below both hold. 

(a) T04,...,T04,04,..., 10, «5 T0,, 0, is a basis of V. 

(b) Aiea er Pty = (0. 


Suppose T € £(V) and 7,...,v,, is a basis of V that is a Jordan basis for T. 
Describe the matrix of T with respect to the basis v,,,...,v, obtained by 
reversing the order of the v’s. 


Suppose T € £(V). Explain why every vector in each Jordan basis for T is 
a generalized eigenvector of T. 


Suppose T € £(V) is diagonalizable. Show that A(T) is a diagonal matrix 
with respect to every Jordan basis for T. 


Suppose T € L(V) is nilpotent. Prove that if v,...,v,, are vectors in V and 
My4,...,mM, are nonnegative integers such that 


T1041, cep 1045 Vy 5 eee PV) «+15 LU ys Uy iS a basis of V 


and 
ae er Oi Pag. = 0, 


then T’1v,,...,T7’""v,, is a basis of null T. 
This exercise shows that n = dimnullT. Thus the positive integer n that 
appears above depends only on T and not on the specific Jordan basis 
chosen for T. 


Suppose F = C and T € L(V). Prove that there does not exist a direct sum 
decomposition of V into two nonzero subspaces invariant under T if and 
only if the minimal polynomial of T is of the form (z — A)“'™” for some 
AEC. 
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8D Trace: A Connection Between Matrices and Operators 


We begin this section by defining the trace of a square matrix. After developing 
some properties of the trace of a square matrix, we will use this concept to define 
the trace of an operator. 


Suppose A is a square matrix with entries in F. The trace of A, denoted by 
tr A, is defined to be the sum of the diagonal entries of A. 


8.48 example: trace of a 3-by-3 matrix 


Suppose 


a. =I =2 
AS 1 23- 2 = 3) |e 
1 2 #O 


The diagonal entries of A, which are shown in red above, are 3, 2, and 0. Thus 
trA=34+2+0=5. 


Matrix multiplication is not commutative, but the next result shows that the 
order of matrix multiplication does not matter to the trace. 


8.49 trace of AB equals trace of BA 


Suppose A is an m-by-n matrix and B is an n-by-m matrix. Then 


tr(AB) = tr(BA). 


Proof Suppose 
Big 6% Aig Bia Bim 
A= : : , B= ; : 

A A Bhi «: B 


ml *"* m,n n,m 


The j" term on the diagonal of the m-by-m matrix AB equals 1, Aj,«By,;- Thus 


Ms 


tr(AB) = 
a] 


Aj Bj 


Ma: u—~18 


~ 
ll 
rary 
ll 
cL 
w 
& 
> 
oe 


(k® term on diagonal of the n-by-n matrix BA) 


II 
Ms 


= tr(BA), 


as desired. 
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We want to define the trace of an operator T € £(V) to be the trace of the 
matrix of T with respect to some basis of V. However, this definition should not 
depend on the choice of basis. The following result will make this possible. 


8.50 trace of matrix of operator does not depend on basis 


Suppose T € £(V). Suppose 1,...,u,, and vj,...,v, are bases of V. Then 


(He IME, (Bag oxen He) = HAMIL, (Ory oxen Gy le 


Proof LetA = M(T, (uy,...,u,,)) and B = M(T, (v1, ...,0,,)). The change-of- 
basis formula tells us that there exists an invertible n-by-n matrix C such that 
A = C7!BC (see 3.84). Thus 


trA= tr( (C-*B)C) 
=e C(C 8) | 
= tr((CC1)B) 


—trB, 


where the second line comes from 8.49. 


Because of 8.50, the following definition now makes sense. 


8.51 definition: trace of an operator 


Suppose T € £(V). The trace of T, denote tr T, is defined by 
tre te MCT Oh) 


where v,,...,V,, is any basis of V. 


Suppose T € L(V) and A is an eigenvalue of T. Recall that we defined the 
multiplicity of A to be the dimension of the generalized eigenspace G(A, T) (see 
8.23); we proved that this multiplicity equals dim null(T — AI)4™Y (see 8.20). 
Recall also that if V is a complex vector space, then the sum of the multiplicities 
of all eigenvalues of T equals dim V (see 8.25). 

In the definition below, the sum of the eigenvalues “with each eigenvalue 
included as many times as its multiplicity” means that if Aj, ..., A,,, are the distinct 
eigenvalues of T with multiplicities dj, ...,d,,,, then the sum is 


d,A4 ae Ain Am: 


Or if you prefer to work with a list of not-necessarily-distinct eigenvalues, with 
each eigenvalue included as many times as its multiplicity, then the eigenvalues 
could be denoted by Aj, ..., A,, (where 1 equals dim V) and the sum is 


Ay ter tAy. 
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8.52 on complex vector spaces, trace equals sum of eigenvalues 


Suppose F = C and T € £(V). Then tr T equals the sum of the eigenvalues 
of T, with each eigenvalue included as many times as its multiplicity. 


Proof There is a basis of V with respect to which T has an upper-triangular 
matrix with the diagonal entries of the matrix consisting of the eigenvalues of T, 
with each eigenvalue included as many times as its multiplicity—see 8.37. Thus 
the definition of the trace of an operator along with 8.50, which allows us to use a 
basis of our choice, implies that tr T equals the sum of the eigenvalues of T, with 
each eigenvalue included as many times as its multiplicity. 


8.53 example: trace of an operator on C? 


Suppose T € £(C3) is defined by 
T (24,29, Z3) = (32, i Z9 _ 223,324 + 225 —_ 323,24 “+ 225). 


Then the matrix of T with respect to the standard basis of C? is 


3 -1 -2 
3 2 -3 |. 
1 2 O 


Adding up the diagonal entries of this matrix, we see that tr T = 5. 

The eigenvalues of T are 1, 2 + 3i, and 2 — 3i, each with multiplicity 1, as 
you can verify. The sum of these eigenvalues, each included as many times as its 
multiplicity, is 1 + (2 + 3i) + (2 — 3i), which equals 5, as expected by 8.52. 


The trace has a close connection with the characteristic polynomial. Suppose 
F=C,T € £(V), and Aj,...,A,, are the eigenvalues of T, with each eigenvalue 
included as many times as its multiplicity. Then by definition (see 8.26), the 
characteristic polynomial of T equals 


(Z — Aq) (Z — Ay). 


Expanding the polynomial above, we can write the characteristic polynomial of T 
in the form 


Zl (Ay tee tA, )2t Oh be t (-1)" (Aq A,)- 


The expression above immediately leads to the next result. Also see 9.65, 
which does not require the hypothesis that F = C. 


8.54 trace and characteristic polynomial 


Suppose F = C andT € L(V). Let n = dim V. Then tr T equals the negative 


n—-1 


of the coefficient of z”~ “ in the characteristic polynomial of T. 
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The next result gives a nice formula for the trace of an operator on an inner 
product space. 


8.55 trace on an inner product space 


Suppose V is an inner product space, T € £(V), and e,,...,e,, is an orthonor- 


mal basis of V. Then 


trT = (Te,,e,) +--+ + (Te, e,)- 


Proof The desired formula follows from the observation that the entry in row k, 
column k of M(T, (e1,...,€,)) equals (Te,,e,) [use 6.30(a) with v = Te;]. 


The algebraic properties of the trace as defined on square matrices translate 
to algebraic properties of the trace as defined on operators, as shown in the next 
result. 


8.56 trace is linear 


The function tr: £(V) > F is a linear functional on £(V) such that 


tr(ST) = tr(TS) 


for all S,T € L(V). 


Proof Choose a basis of V. All matrices of operators in this proof will be with 
respect to that basis. Suppose 5,T € Z£(V). 
If A € F, then 


tr(AT) = tr M(AT) = t(AM(T)) =AtrM(T) = AtT, 


where the first and last equalities come from the definition of the trace of an 
operator, the second equality comes from 3.38, and the third equality follows 
from the definition of the trace of a square matrix. 

Also, 


tr(S+T) = tr M(S+T) = tr(M(S)+M(T)) = te M(S)+tr M(T) = trS+trT, 


where the first and last equalities come from the definition of the trace of an 
operator, the second equality comes from 3.35, and the third equality follows 
from the definition of the trace of a square matrix. The two paragraphs above 
show that tr: Z(V) — F is a linear functional on Z(V). 

Furthermore, 


tr(ST) = tr M (ST) =tr(M(S)M(T)) =tr(M(T)M(S)) =tr M (TS) =tr(TS), 
where the second and fourth equalities come from 3.43 and the crucial third 


equality comes from 8.49. 


The equations tr(ST) = tr(TS) and tr! = dim V uniquely characterize the 
trace among the linear functionals on £(V)—-see Exercise 10. 
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The equation tr(ST) = tr(TS) leads 
to our next result, which does not hold on 
infinite-dimensional vector spaces (see 
Exercise 13). However, additional hy- 
potheses on S, T, and V lead to an infinite- 
dimensional generalization of the result 
below, with important applications to 
quantum theory. 


The statement of the next result does 
not involve traces, but the short proof 
uses traces. When something like this 
happens in mathematics, then usually 
a good definition lurks in the back- 
ground. 


Proof Suppose S,T € £(V). Then 
tr(ST — TS) = tr(ST) —tr(TS) = 0, 


where both equalities come from 8.56. The trace of I equals dim V, which is not 0. 
Because ST — TS and I have different traces, they cannot be equal. 


Exercises 8D 


1 Suppose V is an inner product space and v,w € V. Define an operator 
T € £(V) by Tu = (u,v)w. Find a formula for tr T. 


2 Suppose P € L(V) satisfies P* = P. Prove that 


tr P = dimrange P. 


3 Suppose T € L(V) and T° = T. Prove that the real and imaginary parts of 


tr T are both integers. 


4 Suppose V is an inner product space and T € £(V). Prove that 
trT* =trT. 


5 Suppose V is an inner product space. Suppose T € L(V) is a positive 
operator and tr T = 0. Prove that T = 0. 


6 Suppose V is an inner product space and P,Q € £(V) are orthogonal 


projections. Prove that tr(PQ) > 0. 


7 Suppose T € L(C?) is the operator whose matrix is 


51 —12 
60 —40 
57 —68 


—21 
—28 |. 
1 


Someone tells you (accurately) that —48 and 24 are eigenvalues of T. Without 
using a computer or writing anything down, find the third eigenvalue of T. 
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8 Prove or give a counterexample: If S,T € 2(V), then tr(ST) = (trS)(trT). 


9 Suppose T € £(V) is such that tr(ST) = 0 for all S € L(V). Prove that 
T=0. 


10 Prove that the trace is the only linear functional tT: 2(V) — F such that 
T(ST) = T(TS) 


for all S,T © L(V) and tT) = dim V. 
Hint: Suppose that v,,...,0, is a basis of V. For j,k & {1,...,n}, define 
Pi, € &(V) by P, . (a0, + ++ +4,0,) = 4,0;. Prove that 
1 ifj=k 
0 iff #k. 
Then for T € L(V), use the equation T = Yy_, ie MT), Pix t0 
show that T(T) = tr T. 


T(P i x) = | 


11 Suppose V and W are inner product spaces and T € £(V, W). Prove that if 
€1,...,€, is an orthonormal basis of V and fj, ..., f,, is an orthonormal basis 
of W, then 


errs > eRe 
k=1j=1 


The numbers (Te;, i are the entries of the matrix of T with respect to the 
orthonormal bases e,,...,€, and f,,...5 fy, These numbers depend on the 
bases, but tr(T*T) does not depend on a choice of bases. Thus this exercise 
shows that the sum of the squares of the absolute values of the matrix entries 
does not depend on which orthonormal bases are used. 


12 Suppose V and W are finite-dimensional inner product spaces. 


(a) Prove that (S,T) = tr(T*S) defines an inner product on L(V, W). 

(b) Suppose ej, ...,¢,, is an orthonormal basis of V and fj, ..., f,, is an or- 
thonormal basis of W. Show that the inner product on £(V,W) from 
(a) is the same as the standard inner product on F””, where we identify 
each element of £(V, W) with its matrix (with respect to the bases just 
mentioned) and then with an element of F””. 


Caution: The norm of a linear map T € £(V,W) as defined by 7.56 is not 
the same as the norm that comes from the inner product in (a) above. Unless 
explicitly stated otherwise, always assume that \|T|| refers to the norm as 
defined by 7.86. The norm that comes from the inner product in (a) is called 
the Frobenius norm or the Hilbert-Schmidt norm. 


13. Find S,T € £(P(F)) such that ST — TS = 1. 


Hint: Make an appropriate modification of the operators in Example 3.9. 


This exercise shows that additional hypotheses are needed on S and T to 
extend 8.57 to the setting of infinite-dimensional vector spaces. 


® 
Chapter 9 | Sis 
Multilinear Algebra and Determinants 


We begin this chapter by investigating bilinear forms and quadratic forms on a 
vector space. Then we will move on to multilinear forms. We will show that the 
vector space of alternating n-linear forms has dimension one on a vector space of 
dimension n. This result will allow us to give a clean basis-free definition of the 
determinant of an operator. 

This approach to the determinant via alternating multilinear forms leads to 
straightforward proofs of key properties of determinants. For example, we will see 
that the determinant is multiplicative, meaning that det(ST) = (det S) (det T) for 
all operators S and T on the same vector space. We will also see that T is invertible 
if and only if det T # 0. Another important result states that the determinant of 
an operator on a complex vector space equals the product of the eigenvalues of 
the operator, with each eigenvalue included as many times as its multiplicity. 

The chapter concludes with an introduction to tensor products. 


e F denotes R or C. 
e V and W denote finite-dimensional nonzero vector spaces over F. 
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abstract algebra viewpoint that contributed to the development of linear algebra. 


© Sheldon Axler 2024 332 
S. Axler, Linear Algebra Done Right, Undergraduate Texts in Mathematics, 
https://doi.org/10.1007/978-3-03 1-41026-0_9 


Section 9A Bilinear Forms and Quadratic Forms 333 
9A Bilinear Forms and Quadratic Forms 


Bilinear Forms 


A bilinear form on V is a function from V x V to F that is linear in each slot 
separately, meaning that if we hold either slot fixed then we have a linear function 
in the other slot. Here is the formal definition. 


9.1 definition: bilinear form 


A bilinear form on V is a function B: Vx V = F such that 


ve B(v,u) and v» B(u,v) 


are both linear functionals on V for every u € V. 


For example, if V is a real inner prod- 
uct space, then the function that takes an ,.64 in the definition above, means 
ordered pair (u,v) € Vx Vto (u,v) is gq Jinear function that maps into the 
a bilinear form on V. If Vis anonzero scalar field F. Thus the term bilinear 
complex inner product space, then this functional would be more consistent 
function is not a bilinear form because terminology than bilinear form, which 
the inner product is not linear in the sec- unfortunately has become standard. 
ond slot (complex scalars come out of the 
second slot as their complex conjugates). 

If F = R, then a bilinear form differs from an inner product in that an inner 
product requires symmetry [meaning that B(v,w) = B(w,v) for all v,w € V] 
and positive definiteness [meaning that 6(v,v) > 0 for all v € V\{0}], but these 
properties are not required for a bilinear form. 


9.2 example: bilinear forms 


e The function B: F° x F° = F defined by 


Recall that the term linear functional, 


BU(X1,X25%3)s (Yr Y2» ¥3)) = X1Y2 — 5X2Y3 + 2x3Yy 
is a bilinear form on F°. 


e Suppose A is an n-by-n matrix with A; © F in row j, column k. Define a 
bilinear form 6, on F” by 


Pat ead =>, Aare 
k=1j=1 


The first bullet point is a special case of this bullet point with n = 3 and 


0 1 O 
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e Suppose V is a real inner product space and T € £(V). Then the function 
B: Vx V = R defined by 


Btu, v) = (u, Tv) 
is a bilinear form on V. 
e If nis a positive integer, then the function 8: P,,(R) x P,,(R) — R defined by 
B(p.9) = p(2) -4'(3) 
is a bilinear form on ?,,(R). 
e Suppose g,t € V’. Then the function 6: Vx V = F defined by 
B(u,v) = pu) - To) 
is a bilinear form on V. 


e More generally, suppose that 91,...,9,,T),--..T, € V. Then the function 
B: Vx V = F defined by 


BCU, 0) = Qy(U) + TV) + + Py (MU) + TV) 
is a bilinear form on V. 


A bilinear form on V is a function from Vx V to F. Because V x V is a vector 
space, this raises the question of whether a bilinear form can also be a linear map 
from Vx V to F. Note that none of the bilinear forms in 9.2 are linear maps except 
in some special cases in which the bilinear form is the zero map. Exercise 3 shows 
that a bilinear form 6 on V is a linear map on Vx V only if 6 = 0. 


The set of bilinear forms on V is denoted by V. 


With the usual operations of addition and scalar multiplication of functions, 
V™ is a vector space. 

For T an operator on an n-dimensional vector space V and a basis e,, ..., e,, 
of V, we used an n-by-n matrix to provide information about T. We now do the 
same thing for bilinear forms on V. 


Suppose f is a bilinear form on V and ¢),...,e,, is a basis of V. The matrix of 
B with respect to this basis is the n-by-n matrix M(B) whose entry M(B);,x 
in row j, column k is given by 


M(B) jt = Ble; ex) 


If the basis e,,...,e, is not clear from the context, then the notation 
NOB, (ej5--.e,)) 18 Used. 
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Recall that F”’” denotes the vector space of n-by-n matrices with entries in F 
and that dim F”” = n? (see 3.39 and 3.40). 


95 dimV = (dimV)2 


Suppose e,, ...,e,, is a basis of V. Then the map 6 — M(f) is an isomorphism 
of V onto F”” Furthermore, dim V?) = (dim V)?. 


Proof The map B + M(f) is clearly a linear map of V) into F””. 
For A € F”", define a bilinear form 6, on V by 


n nN 
Ba (Xe Foe $ X ply Ypey Ho + Yn) = Ps 2 Aj eti¥e 
=1j= 
for x1, 5X. Y1,--Y, © F (if V = F" and ey, ...,e,, is the standard basis of F”, this 
B, is the same as the bilinear form 6, in the second bullet point of Example 9.2). 
The linear map B — M(B) from V® to F”" and the linear map A + 6, from 
F" to V® are inverses of each other because By(g) = 6 for all 6 © V™ and 
M(B4) = A forall A € F””, as you should verify. 
Thus both maps are isomorphisms and the two spaces that they connect have 
the same dimension. Hence dim V = dim F”"" = n? = (dim V)2. 


Recall that C' denotes the transpose of a matrix C. The matrix C' is obtained 
by interchanging the rows and the columns of C. 


9.6 composition of a bilinear form and an operator 


Suppose f is a bilinear form on V and T € £(V). Define bilinear forms a 
and p on V by 


a(u,v) = B(u,Tv) and p(u,v) = B(Tu,v). 


Let e;,...,e,, be a basis of V. Then 


M(a) = M(B)M(T) and M(p) = M(T)'M(B). 


Proof Ifj,k € {1,...,n}, then 
M (A); 5 = (Ej, ex) 
= B(e;, Tex) 


= Ble. Y, MT nkem) 
m=1 


y B(e;, Cpl lux 


m=1 


(M(B)M(T)), ,- 


Thus M(a) = M(B)M(T). The proof that (0) = M(T)'M (B) is similar. 
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The result below shows how the matrix of a bilinear form changes if we change 
the basis. The formula in the result below should be compared to the change- 
of-basis formula for the matrix of an operator (see 3.84). The two formulas are 
similar, except that the transpose C' appears in the formula below and the inverse 
C7! appears in the change-of-basis formula for the matrix of an operator. 


9.7 change-of-basis formula 


Suppose 6 € V®). Suppose ¢;,...,e,, and f;,..., f,, are bases of V. Let 


IA = WSs (Goce) auitel 13 = AWB, (Gin aca Hea) 
and C = M(I, (e1,..-.€n), (fa. +++» fn))- Then 
A =C'BC. 


Proof The linear map lemma (3.4) tells us that there exists an operator T € £(V) 
such that Tf, = e, foreach k = 1,...,n. The definition of the matrix of an operator 
with respect to a basis implies that 


M(T, (fis--5fn)) =C. 
Define bilinear forms a, 0 on V by 
a(u,v) = B(u,Tv) and p(u,v) =a(Tu,v) = B(Tu, To). 
Then BCG, ex) = BTA. Th) — OCS» Sk) for all j,k € {1,...,1}. Thus 


A= Ms end ey) 
=C'M (a, fis» f,)) 
= CBC, 


where the second and third lines each follow from 9.6. 


9.8 example: the matrix of a bilinear form on P,(R) 


Define a bilinear form 6 on P>(R) by B(p,q) = p(2) - q‘(3). Let 
A= M(B, (1,x—2,(x-3)7)) and B= M(f, (1,x,x7)) 
and C = M(I, (1,x — 2, (x —3)?), (1,x,x)). Then 


01 0 01 6 1 —2 9 
A=]|0 0 0 and B=]| 0 2 12 and C=] 0 1 -6 |}. 
01 0 0 4 24 0 O 1 


Now the change-of-basis formula 9.7 asserts that A = C'BC, which you can verify 
with matrix multiplication using the matrices above. 
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Symmetric Bilinear Forms 


A bilinear form p € V™ is called symmetric if 


P(u, Ww) = p(w, u) 


for all u, w € V. The set of symmetric bilinear forms on V is denoted by Vee 


9.10 example: symmetric bilinear forms 


e If Vis areal inner product space and 9 € V™ is defined by 
p(u, Ww) = (u, Ww), 
then p is a symmetric bilinear form on V. 
e Suppose V is a real inner product space and T € L(V). Define p € V® by 
p(u,w) = (u, Tw). 


Then p is a symmetric bilinear form on V if and only if T is a self-adjoint 
operator (the previous bullet point is the special case T = I). 


e Suppose p: L(V) x L(V) = F is defined by 
e(S,T) = tr(ST). 


Then p is asymmetric bilinear form on £(V) because trace is a linear functional 
on £(V) andtr(ST) = tr(TS) for all S,T € L(V); see 8.56. 


A square matrix A is called symmetric if it equals its transpose. 


An operator on V may have a symmetric matrix with respect to some but not all 
bases of V. In contrast, the next result shows that a bilinear form on V has a sym- 
metric matrix with respect to either all bases of V or with respect to no bases of V. 


9.12 symmetric bilinear forms are diagonalizable 


Suppose p € V“). Then the following are equivalent. 
(a) o is asymmetric bilinear form on V. 


(b) M(p, (e;,...,e,)) is a symmetric matrix for every basis e,,...,e,, of V. 


(c) M(p, (e;,...,e,)) is a symmetric matrix for some basis ¢,,...,e,, of V. 


(d) M(p, (e;,...,e,)) is a diagonal matrix for some basis ¢,, ...,e,, of V. 
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Proof First suppose (a) holds, so p is a symmetric bilinear form. Suppose 
€1,...,€, is a basis of Vand j,k € {1,...,}. Then PCC) = P(E) because p 
is symmetric. Thus M(p, (e;,...,€,,)) is a symmetric matrix, showing that (a) 
implies (b). 

Clearly (b) implies (c). 

Now suppose (c) holds and e,, ..., e,, is a basis of V such that M(p, (e,,...,e,)) 
is a symmetric matrix. Suppose u,w € V. There exist a,,...,4,,,b1,...,0, © F 
such that u = ae; +--+ +4a,e, and w = bye, + --- + b,e,,. Now 


p(u,w) = e( )) mer Du byex) 
j=l = 


n 


=2 ms a, PK (E> Cx) 


n n 


=). as a; PK (Ck, ej ) 


je lk=at 
= p( y dep, >, 4¢;) 
k=1 j=1 
= p(w, u), 


where the third line holds because M(o) is a symmetric matrix. The equation 
above shows that p is a symmetric bilinear form, proving that (c) implies (a). 

At this point, we have proved that (a), (b), (c) are equivalent. Because every 
diagonal matrix is symmetric, (d) implies (c). To complete the proof, we will 
show that (a) implies (d) by induction on n = dim V. 

If n = 1, then (a) implies (d) because every 1-by-1 matrix is diagonal. Now 
suppose 1 > 1 and the implication (a) ==» (d) holds for one less dimension. 
Suppose (a) holds, so o is a symmetric bilinear form. If o = 0, then the matrix of 
p with respect to every basis of V is the zero matrix, which is a diagonal matrix. 
Hence we can assume that p # 0, which means there exist u,w € V such that 
p(u,w) # 0. Now 


20(u,W) = P(U+ W,U+W) — P(U,U) — p(w, Ww). 


Because the left side of the equation above is nonzero, the three terms on the right 
cannot all equal 0. Hence there exists v € V such that p(v, v) # 0. 

Let U = {u © V: p(u,v) = 0}. Thus U is the null space of the linear 
functional u % o(u,v) on V. This linear functional is not the zero linear functional 
because v € U. Thus dim U = n — 1. By our induction hypothesis, there is a 
basis e,, ...,€,, 1 of U such that the symmetric bilinear form p]|,;,.;; has a diagonal 
matrix with respect to this basis. 

Because v € U, the list e;,...,e,,_1, vis a basis of V. Supposek € {1,...,n—1}. 
Then o(e,v) = 0 by the sonsttiction of U. Because p is avimictnic. § we also 
have p(v, e,.) = 0. Thus the matrix of with respect to e;,...,e,,_1,V is a diagonal 
matrix, completing the proof that (a) implies (d). 
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The previous result states that every symmetric bilinear form has a diagonal 
matrix with respect to some basis. If our vector space happens to be a real inner 
product space, then the next result shows that every symmetric bilinear form has 
a diagonal matrix with respect to some orthonormal basis. Note that the inner 
product here is unrelated to the bilinear form. 


9.13 diagonalization of a symmetric bilinear form by an orthonormal basis 


Suppose V is a real inner product space and p is a symmetric bilinear form on 
V. Then g has a diagonal matrix with respect to some orthonormal basis of V. 


Proof Let fq, ..., f,, be an orthonormal basis of V. Let B = M(p, (fi, -.., f,))- 
Then B is a symmetric matrix (by 9.12). Let T € £(V) be the operator such that 
M(T, (f,,---sfy)) = B. Thus T is self-adjoint. 

The real spectral theorem (7.29) states that T has a diagonal matrix with respect 
to some orthonormal basis e,...,e,, of V. Let C = M (I, (€1, 0501), (fas +> fn) 
Thus C~!TC is the matrix of T with respect to the basis e;, ...,¢,, (by 3.84). Hence 
C-!TC is a diagonal matrix. Now 


MOG we 1 SCT SCTE, 


where the first equality holds by 9.7 and the second equality holds because C is a 
unitary matrix with real entries (which implies that C~' = C'; see 7.57). 


Now we turn our attention to alternating bilinear forms. Alternating multilinear 
forms will play a major role in our approach to determinants later in this chapter. 


9.14 definition: alternating bilinear form, eee 
A bilinear form a € V™ is called alternating if 


a(v,v) =0 


for all v € V. The set of alternating bilinear forms on V is denoted by ve : 


9.15 example: alternating bilinear forms 


e Suppose n > 3 anda: F” x F” = F is defined by 


H( (X15 009% n)s Yrs Yn)) = X1Y2 — X21 + Ys — X3Y1- 
Then « is an alternating bilinear form on F”. 
e Suppose g, t € V’. Then the bilinear form w on V defined by 
a(u,W) = p(u)T(wW) — p(w)T(u) 


is alternating. 
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The next result shows that a bilinear form is alternating if and only if switching 
the order of the two inputs multiplies the output by —1. 


9.16 characterization of alternating bilinear forms 


A bilinear form a on V is alternating if and only if 


a(u,W) = —Aa(wW, U) 


for all u,w € V. 


Proof First suppose that a is alternating. If u, w € V, then 
0O=a(u+wiyutw) 
= A(U,U) + A(U,W) + a(W,U) + AW, Ww) 


= a(u,w) + a(wW,U). 


Thus a(u,w) = —a(w,u), as desired. 
To prove the implication in the other direction, suppose a(u,w) = —a(w, u) 
for all u,w € V. Then a(v,v) = —a(v,v) for all v € V, which implies that 


a(v,v) = 0 forall v € V. Thus a is alternating. 


Now we show that the vector space of bilinear forms on V is the direct sum of 
the symmetric bilinear forms on V and the alternating bilinear forms on V. 


Si Ve = Vira Van 


The sets Vee and Vee are subspaces of V®. Furthermore, 


Ve eV ea 


alt 


Proof The definition of symmetric bilinear form implies that the sum of any two 
symmetric bilinear forms on V is a bilinear form on V, and any scalar multiple of 
any bilinear form on V is a bilinear form on V. Thus Von is a subspace of V?. 
Similarly, the verification that ye is a subspace of V™ is straightforward. 

Next, we want to show that V® = VS, + V. To do this, suppose B € V??. 
Define p,a € V by 


PUwS Pu) eRe and «a(u,w) = pu) = ed) — 


Then p € View anda € Vand B = p +a. Thus V2) = Vin + V2). 
Finally, to show that the intersection of the two subspaces under consideration 


equals {0}, suppose 6 € Ven nN Vey Then 9.16 implies that 

B(u,w) = —B(w,u) = —B(u,w) 
for all u, w € V, which implies that B = 0. Thus V® = Vy ® ve as implied 
by 1.46. 


plu, W) = 
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Quadratic Forms 


9.18 definition: quadratic form associated with a bilinear form, q B 


For f a bilinear form on V, define a function qg: V — F by qg(v) = B(v, 2). 
A function q: V = F is called a quadratic form on V if there exists a bilinear 
form £ on V such that q = qp. 


Note that if 6 is a bilinear form, then q, = 0 if and only if f is alternating. 


9.19 example: quadratic form 


Suppose f is the bilinear form on R? defined by 


BC (X15 %25%3)s Yas Yas ¥3)) = X11 — AX1Y2 + 8X1 Y3 — 3x3Y3. 
Then qg is the quadratic form on R® given by the formula 


Gp (X1.X2,X3) = XP — 4x1Xz + 8xyX3 — 3xZ. 


The quadratic form in the example above is typical of quadratic forms on F”, 
as shown in the next result. 


9.20 quadratic forms on F" 


Suppose n is a positive integer and qg is a function from F” to F. Then q 
is a quadratic form on F” if and only if there exist numbers A;, © F for 
jk € {1,..., 2} such that 


for all (x,,...,x,,) © F” 


Proof First suppose q is a quadratic form on F”. Thus there exists a bilinear form 
6 on F” such that q = qg. Let A be the matrix of 6 with respect to the standard 


basis of F”. Then for all (x, ...,x,,) € F”, we have the desired equation 
n n 


G(X 15 +0 Xp_) = BC(Xq, 0 Xp). (Hy, Xy)) = x LF Aj XX 


k=1j=1 


Conversely, suppose there exist numbers Ay. , © F forj,k € {1,...,} such that 


Hn 
G(X, 005 Xy) = », Ay KXXk 
k=1j=1 


for all (x,,...,x,,) € F” Define a bilinear form f on F” by 
n k 
PO pea te Vien y)) = » yy Ay Xi: 
k=1j=1 


Then q = qp, as desired. 
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Although quadratic forms are defined in terms of an arbitrary bilinear form, 
the equivalence of (a) and (b) in the result below shows that a symmetric bilinear 
form can always be used. 


21. characterization of quadratic forms 


Suppose g: V = F is a function. The following are equivalent. 
(a) gis a quadratic form. 


(b) There exists a unique symmetric bilinear form p on V such that q = qp. 


(c) q(Av) = A*q(v) for all A € F and all v € V, and the function 
(u,W) > q(u+ w) — q(u) — q(w) 
is a symmetric bilinear form on V. 


(d) q(2v) = 4q(v) for all v € V, and the function 
(U,W) > q(u+ Ww) — q(u) — q(w) 


is a symmetric bilinear form on V. 


Proof First suppose (a) holds, so q is a quadratic form. Hence there exists a 
bilinear form B such that q = q,. By 9.17, there exist a symmetric bilinear form p 
on V and an alternating bilinear form a on V such that 6B = p + a. Now 


4 = 48 =p + Ia = Ap: 
Ifo’ € Ve. also satisfies q,, = q, then q,,_, = 0; thus p’ —p € Vo av 


alt ’ 
which implies that p’ = p (by 9.17). This completes the proof that (a) implies (b). 
Now suppose (b) holds, so there exists a symmetric bilinear form p on V such 


that q = q,. If A € F and v € V then 
q(Av) = p(Av, Av) = Ap(v, Av) = A*e(v, 0) = A7q(v), 


showing that the first part of (c) holds. 
If u,w € V, then 


q(ut+w) —q(u) —q(w) = put+w,utw) — p(u,u) — p(w, w) = 20(u,w). 
Thus the function (u,w) » q(u+w)—gq(u)—q(w) equals 20, which is a symmetric 
bilinear form on V, completing the proof that (b) implies (c). 

Clearly (c) implies (d). 
Now suppose (d) holds. Let o be the symmetric bilinear form on V defined by 
q(u + W) — q(u) — q(w) 
5 : 


pu, Ww) = 
If v & V, then 


pv, 0) = q(2v) — q(v) ~ 4) _ 4q(@) ~ 2g) _ 


2 2 
Thus q = q,, completing the proof that (d) implies (a). 


q(v). 


Section 9A Bilinear Forms and Quadratic Forms 343 


9.22 example: symmetric bilinear form associated with a quadratic form 


Suppose q is the quadratic form on R? given by the formula 
G(X1.%o,X%_) = XP — 4x, Xq + Bx{X, — 3x7. 


A bilinear form 8 on R? such that g = qg is given by Example 9.19, but this 
bilinear form is not symmetric, as promised by 9.21(b). However, the bilinear 
form p on R? defined by 


P((15%25%3)s (Yrs Y20Y3)) = X1Y1 — WY. — Wyy, + 4x1Y3 + 4xgy — 3x3Y3 
is symmetric and satisfies q = qp. 


The next result states that for each quadratic form we can choose a basis such 
that the quadratic form looks like a weighted sum of squares of the coordinates, 


meaning that there are no cross terms of the form x,x; with j # k. 


9.23 diagonalization of quadratic form 


Suppose g is a quadratic form on V. 


(a) There exist a basis e,,...,e,, of V and A,,..., A,, © F such that 


q(xXye] + + Xp,C_) = AyxP te 


for all x1, ...,x, © F. 


(b) If F = R and V is an inner product space, then the basis in (a) can be 
chosen to be an orthonormal basis of V. 


Proof 
(a) There exists a symmetric bilinear form p on V such that q = q p (by 9.21). Now 
there exists a basis e;,...,e,, of V such that M(p, (e,,...,e,,)) is a diagonal 
matrix (by 9.12). Let Aj,...,A,, denote the entries on the diagonal of this 
matrix. Thus 
Aj ifj =k, 
é.,e.) = 
OP V0 iti ek 
for all j,k € {1,..., 1}. If x,,...,x,, € F, then 
q(x ey ee Xyln) = p(xyey Hire + XC, XOy Fo + Xyln) 
n n 


= by ye Xj X40 (Cj Ck) 


k=1j=1 
=Ayxpte +A,x?, 
as desired. 
(b 


mn 


Suppose F = R and V is an inner product space. Then 9.13 tells us that the 
basis in (a) can be chosen to be an orthonormal basis of V. 
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Exercises 9A 


1 Prove that if 6 is a bilinear form on F, then there exists c € F such that 


BXy) = exy 
for all x,y € F. 
2 Letn = dimV. Suppose f is a bilinear form on V. Prove that there exist 
P15 +005 Py» T1s +++» Ty € V' such that 
B(U, 0) = Py (U) + T1(V) + ++ + Py(U) + TO) 


for all u,v € V. 
This exercise shows that if n = dim V, then every bilinear form on V is of 


the form given by the last bullet point of Example 9.2. 


3 Suppose 8: Vx V = F is a bilinear form on V and also is a linear functional 
on Vx V. Prove that 6 = 0. 


4 Suppose V is a real inner product space and £ is a bilinear form on V. Show 
that there exists a unique operator T € £(V) such that 


B(u,v) = (u, Tv) 


for all u,v € V. 


This exercise states that if V is a real inner product space, then every 
bilinear form on V is of the form given by the third bullet point in 9.2. 


5 Suppose £ is a bilinear form on a real inner product space V and T is the 
unique operator on V such that B(u,v) = (u,Tv) for all u,v € V (see 
Exercise 4). Show that 6 is an inner product on V if and only if T is an 
invertible positive operator on V. 


6 Prove or give a counterexample: If is a symmetric bilinear form on V, then 
{v EV: p(v,v) = 0} 
is a subspace of V. 
7 Explain why the proof of 9.13 (diagonalization of a symmetric bilinear form 


by an orthonormal basis on a real inner product space) fails if the hypothesis 
that F = R is dropped. 


2 


8 Find formulas for dim Vex and dim ve in terms of dim V. 


9 Suppose that 7 is a positive integer and V = {p € P,,(R) : p(0) = p(1)}. 
Define a: Vx V > R by 


t 


i 
a(prg) = | pa 


Show that « is an alternating bilinear form on V. 
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10 Suppose that n is a positive integer and 
V = {p € P,,(R) : p(O) = p() and p’(0) = p’(1)}. 
Define p: Vx V > R by 


1 W 
ptpea) = |. pa 


Show that p is a symmetric bilinear form on V. 
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9B Alternating Multilinear Forms 


Multilinear Forms 


9.24 definition: V™ 


For m a positive integer, define V” by 


V™=Vx--x V. 
——" 


m times 


Now we can define m-linear forms as a generalization of the bilinear forms 
that we discussed in the previous section. 


9.25 definition: m-linear form, V”, multilinear form 


e For m a positive integer, an m-linear form on V is a function B: V™ > F 
that is linear in each slot when the other slots are held fixed. This means 
that for each k € {1,...,m} and all uy,...,u,, € V, the function 


Ci? PU aes Up, Uy ey eees eee) 


is a linear map from V to F. 


e The set of m-linear forms on V is denoted by V“”™. 


e A function f is called a multilinear form on V if it is an m-linear form on V 
for some positive integer m. 


In the definition above, the expression A (uy, ..., Up_7, UV, Up44, +++) U,,) Means 
BO, Up, ...,Uy,) ifk = 1 and means A(uy,...,U,, 7,0) ifk = m. 

A 1-linear form on V is a linear functional on V. A 2-linear form on V is 
a bilinear form on V. You can verify that with the usual addition and scalar 
multiplication of functions, V“”) is a vector space. 


9.26 example: m-linear forms 


e Suppose a,o € V™. Define a function B: V* > F by 


B01, V2, 03,04) = &(V1, 02) P(V3, U4). 
Then B € V™. 
e Define B: (L(V))” > F by 
Ba Se eT 


Then 6 is an m-linear form on Z(V). 
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Alternating multilinear forms, which we now define, play an important role as 
we head toward defining determinants. 


9.27 definition: alternating forms, Ve 


Suppose m is a positive integer. 


e Anm-linear form « on V is called alternating if a (01, ...,0,,) = 0 whenever 
01, ++) 0, is a list of vectors in V with v; = % for some two distinct values 
of j and k in {1,..., m}. 


e Vi) = {w € V™ : w is an alternating m-linear form on V}. 


You should verify that ve is a subspace of V°. See Example 9.15 for 
examples of alternating 2-linear forms. See Exercise 2 for an example of an 
alternating 3-linear form. 

The next result tells us that if a linearly dependent list is input to an alternating 


multilinear form, then the output equals 0. 


9.28 alternating multilinear forms and linear dependence 


Suppose m is a positive integer and « is an alternating m-linear form on V. If 


V1, ++-)U,, is a linearly dependent list in V, then 


BMD coco Oz) = (0, 


Proof Suppose v,...,v,, is a linearly dependent list in V. By the linear depen- 
dence lemma (2.19), some v, is a linear combination of 7, ...,v,_,. Thus there 
exist b,, ...,b,_, such that v, = b,v, + +++ +b,_10,_1. Now 


k-1 
X(04, 50m) = A( Op, P—1s ¥ biU;, Vg 4.15 sti) 
j=1 


= > D; (Dy, +++ Op_1s Djs Og ads +09 Om) 


The next result states that if m > dim V, then there are no alternating m-linear 
forms on V other than the function on V” that is identically 0. 


Proof Suppose that a is an alternating m-linear form on V and 7,...,v,, € V. 
Because m > dim V, this list is not linearly independent (by 2.22). Thus 9.28 
implies that #(v,...,V,,) = 0. Hence a is the zero function from V™ to F. 
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Alternating Multilinear Forms and Permutations 


9.30 swapping input vectors in an alternating multilinear form 


Suppose m is a positive integer, a is an alternating m-linear form on V, and 


V1, «++, U,, iS a list of vectors in V. Then swapping the vectors in any two slots 
of a(V,,...,U,,) changes the value of a by a factor of —1. 


Proof Put v, + vp in both the first two slots, getting 
0 = (01 + Up, Uy + U2, U3, «+65 Vy): 


Use the multilinear properties of « to expand the right side of the equation above 
(as in the proof of 9.16) to get 


NX (Vp, 01, V3, 0005 Vy) = MV, Va, U35 «005 Vi) 


Similarly, swapping the vectors in any two slots of #(v,...,V,,) changes the 
value of « by a factor of —1. 


To see what can happen with multiple swaps, suppose a is an alternating 
3-linear form on V and v,,¥7,v3 € V. To evaluate a(v3,v,, V2) in terms of 
X(V1, Vz, V3), Start with a(v3,v,, V2) and swap the entries in the first and third 
slots, getting a(v3, 01, V2) = —M(Vz, V1, V3). Now in the last expression, swap the 
entries in the first and second slots, getting 


(U3, 01,02) = —X(U, 01,03) = H(01, 02, U3). 


More generally, we see that if we do an odd number of swaps, then the value of a 
changes by a factor of —1, and if we do an even number of swaps, then the value 
of a does not change. 

To deal with arbitrary multiple swaps, we need a bit of information about 
permutations. 


9.31 definition: permutation, permm 


Suppose m is a positive integer. 


e A permutation of (1,...,m) is a list (j,,...,j,,) that contains each of the 
numbers 1, ..., 7m exactly once. 


e The set of all permutations of (1, ..., 7) is denoted by perm m. 


For example, (2,3,4,5,1) © perm5. You should think of an element of 
perm m as a rearrangement of the first m positive integers. 

The number of swaps used to change a permutation (j,,...,/,,) to the stan- 
dard order (1, ..., 7) can depend on the specific swaps selected. The following 
definition has the advantage of assigning a well-defined sign to every permutation. 
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9.32 definition: sign of a permutation 


The sign of a permutation (j,...,/,,) is defined by 
sign(j,, Bersfoe) = (-1)%, 


where N is the number of pairs of integers (k,?) with 1 < k < €< msuch 
that k appears after in the list (j,,...,j,,)- 


Hence the sign of a permutation equals 1 if the natural order has been changed 
an even number of times and equals —1 if the natural order has been changed an 
odd number of times. 


9.33 example: signs 


e The permutation (1, ...,7) [no changes in the natural order] has sign 1. 


e The only pair of integers (k, £) with k < £ such that k appears after ¢ in the list 
(2,1, 3,4) is (1,2). Thus the permutation (2, 1,3, 4) has sign —1. 


e In the permutation (2,3, ...,m, 1), the only pairs (k, 2) with k < £ that appear 
with changed order are (1,2), (1,3), ..., (1,7). Because we have m — 1 such 
pairs, the sign of this permutation equals (—1)""~!. 


9.34 swapping two entries in a permutation 


Swapping two entries in a permutation multiplies the sign of the permutation 
by —1. 


Proof Suppose we have two permutations, where the second permutation is 
obtained from the first by swapping two entries. The two swapped entries were 
in their natural order in the first permutation if and only if they are not in their 
natural order in the second permutation. Thus we have a net change (so far) of 1 
or —1 (both odd numbers) in the number of pairs not in their natural order. 

Consider each entry between the two swapped entries. If an intermediate entry 
was originally in the natural order with respect to both swapped entries, then it 
is now in the natural order with respect to neither swapped entry. Similarly, if 
an intermediate entry was originally in the natural order with respect to neither 
of the swapped entries, then it is now in the natural order with respect to both 
swapped entries. If an intermediate entry was originally in the natural order with 
respect to exactly one of the swapped entries, then that is still true. Thus the net 
change (for each pair containing an entry between the two swapped entries) in the 
number of pairs not in their natural order is 2, —2, or 0 (all even numbers). 

For all other pairs of entries, there is no change in whether or not they are in 
their natural order. Thus the total net change in the number of pairs not in their 
natural order is an odd number. Hence the sign of the second permutation equals 
—1 times the sign of the first permutation. 
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9.35 permutations and alternating multilinear forms 


Suppose m is a positive integer and w € Vv“. Then 


alt 
A(V; 5 +150) -) = (SiQM(Jy, «+5 fig) )& (V5 «+29 Om) 


for every list v1, ...,v,,, of vectors in V and all (j,...,j,,) € permm. 


Proof Suppose 7,,...,v,, € V and (j;,...,j,,) € permm. We can get from 
(j1, -++»f/m) to (1, ..., 77) by a series of swaps of entries in different slots. Each such 
swap changes the value of w by a factor of —1 (by 9.30) and also changes the sign 
of the remaining permutation by a factor of —1 (by 9.34). After an appropriate 
number of swaps, we reach the permutation 1, ...,m, which has sign 1. Thus the 
value of a changed signs an even number of times if sign(/,, ...,j,,,) = 1 and an 
odd number of times if sign(j,, ...,j,,) = —1, which gives the desired result. 


Our use of permutations now leads in a natural way to the following beautiful 
formula for alternating n-linear forms on an n-dimensional vector space. 


9.36 formula for (dim V)-linear alternating forms on V 


Let n = dim V. Suppose e,,...,e,, is a basis of V and v,...,v,, € V. For each 
k € {1,...,n}, let b; ,,....0,, © F be such that 


n 
Oo, = ye b; xe; 
j=l 


Then 


M(Vy 5009 O pn) = WCC vee On) », (SING pero) 8a Ore 


Jn) € permn 


Proof Suppose « is an alternating n-linear form a on V. Then 


n n 
X(V1, +5 0_) = a( > b; 16449 ae a by.) 


ji=1 In = 


. , Bj .17D jn UCC,» +9 G,) 
sees Jn) © permn 


Y (sign Gas sin) Bi 1 Bh, ons 


Gi1>+Jn) € permn 


| 
R 
A 
S 
a 
iS 
= 
~~ 


where the third line holds because ACC; p15) ) = 0 if j,,...,7,, are not distinct 
integers, and the last line holds by 9.35. 
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The following result will be the key to our definition of the determinant in the 
next section. 


Sayardin al 


(dim V) 


a has dimension one. 


The vector space V 


Proof Letn = dim V. Suppose « and a’ are alternating n-linear forms on V with 
a #0. Let e,...,e,, be such that w(e},...,e,,) # 0. There exists c € F such that 


WH (Cj3.0s:90 2) = CH (Cj, wig): 


Furthermore, 9.28 implies that e,, ...,e,, is linearly independent and thus is a basis 
of V. 
Suppose 7, ...,,, € V. Let b; , be as in 9.36 for j,k = 1,...,n. Then 


W' (V4, «+15 Uy) 


W' (01, 005 ly) > (sign (j1,---»fn))2),,1°°O) on 


Gias--in) © permn 


= CH(Ey,..-,€,) y, (sign (/1,---»fn))Bj,,1°°°O) in 


Gases Jn) © permn 


= CU(U1, ++, Uy), 


where the first and last lines above come from 9.36. The equation above implies 
that a’ = ca. Thus a’‘a is not a linearly independent list, which implies that 
dim V“? <1. 

To complete the proof, we only need to show that there exists a nonzero 
alternating n-linear form a on V (thus eliminating the possibility that dim atk 
equals 0). To do this, let e;,...,e,, be any basis of V, and let g,...,9, € V’ be 
the linear functionals on V that allow us to express each element of V as a linear 
combination of e,...,e,,. In other words, 


n 
v= Di ge, 
j=l 
for every v € V(see 3.114). Now for v,,...,v,, € V, define 
9.38 X(V4,-+50_) = s (sign (j1,---sjn)) Pj, (O1)--P, (U,)- 
Gi>-2Jn) © permn 
The verification that a is an n-linear form on V is straightforward. 

To see that a is alternating, suppose v,...,v,, € V with v; = v9. For each 
(1, --sJn) € permn, the permutation (jy, j,, jz, ...,/,) has the opposite sign. Be- 
cause V, = Vp, the contributions from these two permutations to the sum in 9.38 
cancel either other. Hence a(v,, 01, 03, ...,U,,) = 0. Similarly, w(v,,...,v,,) = 0 if 
any two vectors in the list v),...,v,, are equal. Thus a is alternating. 

Finally, consider 9.38 with each 4% = e,. Because 9;(e,) equals 0 if 7 # k and 
equals 1 if j = k, only the permutation (1, ...,1) makes a nonzero contribution to 
the right side of 9.38 in this case, giving the equation a(e,,...,e,,) = 1. Thus we 
have produced a nonzero alternating n-linear form « on V, as desired. 
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Earlier we showed that the value of The formula 9.38 used in the last proof 
an alternating multilinear form applied 4 construct a nonzero a Iternating n- 
to a linearly dependent list is 0; see 9.28. Jinear form came from the formula in 
The next result provides a converse of 9.36, and that formula arose naturally 
9.28 for n-linear multilinear forms when from the properties of an alternating 
n = dim V. In the following result, the — multilinear form. 
statement that a is nonzero means (as 
usual for a function) that « is not the function on V” that is identically 0. 


9.39 alternating (dim V)-linear forms and linear independence 


Let n = dim V. Suppose a is a nonzero alternating n-linear form on V and 
€1,---€, is a list of vectors in V. Then 


MC jo cm@a) 32 


if and only if e;,...,e,, is linearly independent. 


Proof First suppose a (ey, ...,e,,) # 0. Then 9.28 implies that e,, ...,e,, is linearly 
independent. 

To prove the implication in the other direction, now suppose é, ..., é,, is linearly 
independent. Because n = dim V, this implies that e,,...,e,, is a basis of V (see 
2.38). 

Because a is not the zero n-linear form, there exist v,,...,v,, € V such that 
X(01,...,0,) # 0. Now 9.36 implies that w(e,,...,e,,) # 0. 


Exercises 9B 


1 Suppose m is a positive integer. Show that dim V“" = (dim V)". 
2 Suppose 1 > 3 and a: F" x F” x F” = F is defined by 
H( (X45 sXe (Yqs «209 Yn)s (Zs es Zn)) 
= %1Yo%3 — X2Y123 — X3Yo%1 — X1Y3Z2 + X3Y122 + X2Y3%1- 
Show that « is an alternating 3-linear form on F”. 


3 Suppose m is a positive integer and a is an m-linear form on V such that 


X(Vy, «++, V_,) = 0 whenever vj, ..., V,, is a list of vectors in V with 0; = 0;,.4 


for some j € {1,...,m—1}. Prove that a is an alternating m-linear form on V. 
4 Prove or give a counterexample: If a € ve then 


{ (4,05, Ua, 04) EV" + A(04, Bz, 05, 04) = 0} 


is a subspace of V+. 
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Suppose m is a positive integer and f is an m-linear form on V. Define an 
m-linear form « on V by 


A(Vq, +005 Oy) = yD (sign()y, «+5 fm))B (Oj, > +++ %,) 


Ga ang, Jm) © permm 
F (m) 
for 04, ..., 0, © V. Explain why a € V2". 


Suppose m is a positive integer and f is an m-linear form on V. Define an 
m-linear form « on V by 


DOs) = a BO), > +9 %,,) 


(j15-+-s/m) © permm 


for v,,...,0,, € V. Explain why 
AUK, vy UK, ) = A(Uq, «-, Oy) 
for all v1,...,0,, € V and all (k,,...,k,,) © permm. 


Give an example of a nonzero alternating 2-linear form a on R° and a linearly 
independent list v,, 7, in R° such that a(v,, v2) = 0. 


This exercise shows that 9.39 can fail if the hypothesis that n = dim V is 
deleted. 
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Defining the Determinant 


The next definition will lead us to a clean, beautiful, basis-free definition of the 
determinant of an operator. 


Suppose that m is a positive integer and T € L(V). For a € V“”, define 


alt ° 
ar © Vi by 
Ap (04 5:2:5 Um) = XL}, «:-, LO;,) 


for each list v,,..., 0, of vectors in V. 


Suppose T € £(V). Ifa e ve and Vj, ...,V,, is a list of vectors in V with 


0; = 0; for some j # k, then To, = To, which implies that a7(0,...,0,,) = 
a(T0;,...,TV,,) = 0. Thus the function « + «7 is a linear map of Vie to itself. 

We know that dim Vee = 1 (See 9.37). Every linear map from a one- 
dimensional vector space to itself is multiplication by some unique scalar. For 


the linear map a + ay, we now define det T to be that scalar. 


Suppose T € £(V). The determinant of T, denoted by det T, is defined to be 
the unique number in F such that 


ar = (detT)a 


for all a e Vin’), 


9.42 example: determinants of operators 


Let n = dim V. 
e If J is the identity operator on V, then #, = a for all a € V‘". Thus det! = 1. 


alt * 


e More generally, if A € F, then a; = A"a for all a € V“". Thus det(AI) = A”. 


alt * 
e Still more generally, if T € 2(V) and A EF, then ay; = A"ay = A" (detT)a 
for all a € VSP. Thus det(AT) = A” det T. 


alt ° 
e Suppose T € £(V) and there is a basis e,, ..., e,, of V consisting of eigenvectors 


of T, with corresponding eigenvalues A,,...,A,,. Ifa € Ve. then 
Wp (Cys ly) = A(Aze yg, Ane) = (Aq Ay) a (ey, 05 On): 
If « # 0, then 9.39 implies a(e,,...,e,,) # 0. Thus the equation above implies 


detT = A,--A,. 
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Our next task is to define and give a formula for the determinant of a square 
matrix. To do this, we associate with each square matrix an operator and then 
define the determinant of the matrix to be the determinant of the associated 
operator. 


9.43 definition: determinant of a matrix, det A 


Suppose that 1 is a positive integer and A is an n-by-n square matrix with 
entries in F. Let T € L(F") be the operator whose matrix with respect to 
the standard basis of F” equals A. The determinant of A, denoted by det A, is 
defined by det A = det T. 


9.44 example: determinants of matrices 


e If lis the n-by-n identity matrix, then the corresponding operator on F” is the 
identity operator J on F”. Thus the first bullet point of 9.42 implies that the 
determinant of the identity matrix is 1. 


e Suppose A is a diagonal matrix with ,,...,A,, on the diagonal. Then the 
corresponding operator on F” has the standard basis of F” as eigenvectors, 
with eigenvalues A,,...,A,,. Thus the last bullet point of 9.42 implies that 
det A = A,--A,,. 


For the next result, think of each list v,,...,v,, of n vectors in F” as a list of 
n-by-1 column vectors. The notation ( Uy os Uy ) then denotes the n-by-n 
square matrix whose k" column is v, for each k = 1,...,n. 


9.45 determinant is an alternating multilinear form 


Suppose that n is a positive integer. The map that takes a list vj,...,v,, of 
vectors in F” to det ( Uy «Uy ) is an alternating -linear form on F”. 


Proof Let e,,...,e, be the standard basis of F” and suppose v,,...,V,, is a list of 
vectors in F”. Let T € £(F") be the operator such that Te, = v, fork = 1,...,n. 
Thus T is the operator whose matrix with respect to e;,...,¢, is (0, + U, ). 
Hence det( v, + %, ) = detT, by definition of the determinant of a matrix. 
Let a be an alternating n-linear form on F” such that a(e,,...,e,,) = 1. Then 


det( v, «+ v, ) =detT 
= (det T) a(e1,...,e,) 
= a(Te,,...,Te,) 
= K(0y, ++, Oy), 
where the third line follows from the definition of the determinant of an operator. 


The equation above shows that the map that takes a list of vectors v1, ...,v,, in F” 
todet( v,; + 9%, ) is the alternating n-linear form a on F". 
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The previous result has several important consequences. For example, it 
immediately implies that a matrix with two identical columns has determinant 0. 
We will come back to other consequences later, but for now we want to give a 
formula for the determinant of a square matrix. Recall that if A is a matrix, then 
Aj,« denotes the entry in row j, column k of A. 


9.46 formula for determinant of a matrix 


Suppose that 1 is a positive integer and A is an n-by-n square matrix. Then 


detA= (signi, jn) Aj 1 Ajan: 


(ji>+-Jn) € permn 


Proof Apply 9.36 with V = F” and e,,...,e,, the standard basis of F” and a the 
alternating n-linear form on F" that takes v,,...,v,, todet( 7, + 9%, ) [see 
9.45]. If each v, is the kK" column of A, then each b, , in 9.36 equals A; ,. Finally, 


RG S delle, + -&, ) = detl = 1, 


Thus the formula in 9.36 becomes the formula stated in this result. 


9.47 example: explicit formula for determinant 


e If A is a 2-by-2 matrix, then the formula in 9.46 becomes 


det A = Ay1A2,2 = Ao 1A12: 


e If A is a 3-by-3 matrix, then the formula in 9.46 becomes 
detA =A, 1Az 9A3,3 — Az,141,243,3 — A3,142,241,3 
— Ay 143,249,3 + A3,141,242,3 + Az,143,241,3- 


The sum in the formula in 9.46 contains n! terms. Because n! grows rapidly as 
n increases, the formula in 9.46 is not a viable method to evaluate determinants 
even for moderately sized n. For example, 10! is over three million, and 100! is 
approximately 10'°8, leading to a sum that the fastest computer cannot evaluate. 
We will soon see some results that lead to faster evaluations of determinants than 
direct use of the sum in 9.46. 


9.48 determinant of upper-triangular matrix 


Suppose that A is an upper-triangular matrix with A,,..., A,, on the diagonal. 
Then det A = A,---A,,. 


Proof If (j,,...,j,) € perma with (j,,...,j,) # (1,...,7), then j, > k for some 
k € {1,...,n}, which implies that A; , = 0. Thus the only permutation that 
can make a nonzero contribution to the sum in 9.46 is the permutation (1, ..., 7). 
Because A, , = A, for each k = 1,...,1, this implies that det A = A,---A,,. 
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Properties of Determinants 


Our definition of the determinant leads to the following magical proof that the 
determinant is multiplicative. 


9.49 determinant is multiplicative 


(a) Suppose S,T € £(V). Then det(ST) = (det S) (det T). 


(b) Suppose A and B are square matrices of the same size. Then 


det(AB) = (det A) (det B) 


Proof 
(a) Let n = dim V. Suppose a € V“ and 2,...,0, € V. Then 


alt 
Aop (Uy, +5 0_) = a(STV,...,STV,) 
= (det S)a(Tv,,..., Tv,,) 
= (det S) (det T)a(v4,...,0,), 
where the first equation follows from the definition of a, the second equation 


follows from the definition of det S, and the third equation follows from the 
definition of det T. The equation above implies that det(ST) = (det S)(det T). 


Let S,T € Z(F") be such that W(S) = A and M(T) = B, where all matrices 
of operators in this proof are with respect to the standard basis of F”. Then 
M (ST) = M(S)M(T) = AB (see 3.43). Thus 


det(AB) = det(ST) = (det S)(detT) = (det A) (det B), 


(b 


nm 


where the second equality comes from the result in (a). 


The determinant of an operator determines whether the operator is invertible. 


9.50 invertible <=» nonzero determinant 


An operator T € £(V) is invertible if and only if det T # 0. Furthermore, if 


T is invertible, then det(T~') = >. 


Proof First suppose T is invertible. Thus TT! = I. Now 9.49 implies that 
1 = det] = det(TT-') = (det T) (det(T-')). 
Hence det T # 0 and det(T~') is the multiplicative inverse of det T. 
To prove the other direction, now suppose det T # 0. Suppose v € V and 


v #0. Let v,6,...,e,, be a basis of V and let a € Vv be such that a # 0. Then 
X(V, €p,---5€,) # O (by 9.39). Now 


a(Tv, Teo,...,Te,) = (det T)a(v, ey,...,€,) # 0, 
Thus Tv + 0. Hence T is invertible. 
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An n-by-n matrix A is invertible (see 3.80 for the definition of an invertible 
matrix) if and only if the operator on F” associated with A (via the standard basis 
of F”) is invertible. Thus the previous result shows that a square matrix A is 
invertible if and only if detA + 0. 


9.51 eigenvalues and determinants 


Suppose T € £(V) and A € F. Then A is an eigenvalue of T if and only if 
det(AI — T) = 0. 


Proof The number A is an eigenvalue of T if and only if T — AJ is not invertible 
(see 5.7), which happens if and only if AI — T is not invertible, which happens if 
and only if det(AI — T) = 0 (by 9.50). 


Suppose T € £(V) and S: W = Vis an invertible linear map. To prove that 
det(S-!TS) = det T, we could try to use 9.49 and 9.50, writing 


det(S-!TS) = (det S~') (det T) (det S) 
= det T. 


That proof works if W = V, but if W # V then it makes no sense because the 
determinant is defined only for linear maps from a vector space to itself, and S 
maps W to V, making det S undefined. The proof given below works around this 
issue and is valid when W ¢ V. 


9.52 determinant is a similarity invariant 


Suppose T € £(V) and S: W = Vis an invertible linear map. Then 


det(S-!TS) = det T. 


Proof Letn =dimW = dim V. Suppose t € W“". Define a € V“? by 
WG cc) STS Oy.y Se) 
for v1,...,0, € V. Suppose wy,...,w, € W. Then 
Reape De, = TS Tot gue Toe.) 

= a(TSwWy,..., TSw,,) 
= Ay (SWy,..., SW,) 
= (det T)a(Swy,...,Sw,) 
= (det T)T(wWy,...,W,). 


The equation above and the definition of the determinant of the operator S~'TS 
imply that det(S~!TS) = det T. 
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For the special case in which V = F” and e,...,e,, is the standard basis of F”, 
the next result is true by the definition of the determinant of a matrix. The left 
side of the equation in the next result does not depend on a choice of basis, which 
means that the right side is independent of the choice of basis. 


9.53 determinant of operator equals determinant of its matrix 


Suppose T € £(V) ande,,...,e,, is a basis of V. Then 


det T = det M(T, (e,,...,e,))- 


Proof Let f,,..., f, be the standard basis of F”. Let S: F” > V be the linear 
map such that Sf, = e, foreach k = 1,...,n. Thus M(S, (fy, -.-. fin), (C15 +13 On) 
and M (S71, (€1, .5€,)s (fi, +++» f,)) both equal the n-by-n identity matrix. Hence 


9.54 MASTS, Cave da)) = Oy Cty) 
as follows from two applications of 3.43. Thus 
det T = det(S~'!TS) 
= det MIS TS, iat) 
= det M(T, (€1,...5€,)), 


where the first line comes from 9.52, the second line comes from the definition of 
the determinant of a matrix, and the third line follows from 9.54. 


The next result gives a more intuitive way to think about determinants than the 
definition or the formula in 9.46. We could make the characterization in the result 
below the definition of the determinant of an operator on a finite-dimensional 
complex vector space, with the current definition then becoming a consequence 
of that definition. 


9.55 if F =C , then determinant equals product of eigenvalues 


Suppose F = C and T € £(V). Then det T equals the product of the eigen- 
values of T, with each eigenvalue included as many times as its multiplicity. 


Proof There is a basis of V with respect to which T has an upper-triangular 
matrix with the diagonal entries of the matrix consisting of the eigenvalues of T, 
with each eigenvalue included as many times as its multiplicity—see 8.37. Thus 
9.53 and 9.48 imply that det T equals the product of the eigenvalues of T, with 
each eigenvalue included as many times as its multiplicity. 


As the next result shows, the determinant interacts nicely with the transpose of 
a square matrix, with the dual of an operator, and with the adjoint of an operator 
on an inner product space. 
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9.56 determinant of transpose, dual, or adjoint 


(a) Suppose A is a square matrix. Then det A‘ = det A. 


(b) Suppose T € £(V). Then det T’ = det T. 


(c) Suppose V is an inner product space and T € £(V). Then 


det(T*) = det T. 


Proof 


(a) 


(b 
(c) 


wm 


Let n be a positive integer. Define a: (F"”)” — F by 


t 
a(( Q, Oy, )) = det( ( 0, D, ) ) 
for all v;,...,v,, € F”. The formula in 9.46 for the determinant of a matrix 
shows that # is an n-linear form on F”. 


Suppose v;,...,0,, € F" and v; = % for some j # k. If B is an n-by-n matrix, 


then ( Uy os Oy )'B cannot equal the identity matrix because row j and 
rowkof( 7, + 0, )'B are equal. Thus( 7, + %, y is not invertible, 
which implies that a(( Vy Up )) = 0. Hence a is an alternating n- 


linear form on F” 


Note that w applied to the standard basis of F” equals 1. Because the vector 
space of alternating n-linear forms on F” has dimension one (by 9.37), this 
implies that w is the determinant function. Thus (a) holds. 


The equation det T’ = det T follows from (a) and 9.53 and 3.132. 


Pick an orthonormal basis of V. The matrix of T* with respect to that basis is 
the conjugate transpose of the matrix of T with respect to that basis (by 7.9). 
Thus 9.53, 9.46, and (a) imply that det(T*) = det T. 


9.57 helpful results in evaluating determinants 


If either two columns or two rows of a square matrix are equal, then the 
determinant of the matrix equals 0. 


Suppose A is a square matrix and B is the matrix obtained from A by 
swapping either two columns or two rows. Then det A = — det B. 


If one column or one row of a square matrix is multiplied by a scalar, then 
the value of the determinant is multiplied by the same scalar. 


If a scalar multiple of one column of a square matrix to added to another 
column, then the value of the determinant is unchanged. 


If a scalar multiple of one row of a square matrix to added to another row, 
then the value of the determinant is unchanged. 
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Proof All the assertions in this result follow from the result that the maps 
Dy, +450, > det( v, -- v, ) and %,...,0, > det( 0, -- 0, )' are both 
alternating n-linear forms on F” [see 9.45 and 9.56(a)]. 

For example, to prove (d) suppose 7,...,v,, € F” and c € F. Then 


det( vj +cVy Vy ++ VD, ) 
=det( v, 0) ++ v, )+cdet( ¥ % 03 ++ Y, ) 
=det(v, % - ¥, ), 


where the first equation follows from the multilinearity property and the second 
equation follows from the alternating property. The equation above shows that 
adding a multiple of the second column to the first column does not change the 
value of the determinant. The same conclusion holds for any two columns. Thus 
(d) holds. 

The proof of (e) follows from (d) and from 9.56(a). The proofs of (a), (b), and 
(c) use similar tools and are left to the reader. 


For matrices whose entries are concrete numbers, the result above leads to a 
much faster way to evaluate the determinant than direct application of the formula 
in 9.46. Specifically, apply the Gaussian elimination procedure of swapping 
rows [by 9.48(b), this changes the determinant by a factor of —1], multiplying 
a row by a nonzero constant [by 9.48(c), this changes the determinant by the 
same constant], and adding a multiple of one row to another row [by 9.48(e), this 
does not change the determinant] to produce an upper-triangular matrix, whose 
determinant is the product of the diagonal entries (by 9.48). If your software keeps 
track of the number of row swaps and of the constants used when multiplying a 
row by a constant, then the determinant of the original matrix can be computed. 

Because a number A € F is an eigenvalue of an operator T € £(V) if and 
only if det(AI — T) = 0 (by 9.51), you may be tempted to think that one way 
to find eigenvalues quickly is to choose a basis of V, let A = M(T), evaluate 
det(AI — A), and then solve the equation det(AI — A) = 0 for A. However, that 
procedure is rarely efficient, except when dim V = 2 (or when dim V equals 3 or 
4 if you are willing to use the cubic or quartic formulas). One problem is that the 
procedure described in the paragraph above for evaluating a determinant does not 
work when the matrix includes a symbol (such as the A in AI — A). This problem 
arises because decisions need to be made in the Gaussian elimination procedure 
about whether certain quantities equal 0, and those decisions become complicated 
in expressions involving a symbol A. 

Recall that an operator on a finite-dimensional inner product space is unitary 
if it preserves norms (see 7.51 and the paragraph following it). Every eigenvalue 
of a unitary operator has absolute value 1 (by 7.54). Thus the product of the 
eigenvalues of a unitary operator has absolute value 1. Hence (at least in the case 
F = C) the determinant of a unitary operator has absolute value 1 (by 9.55). The 
next result gives a proof that works without the assumption that F = C. 
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9.58 every unitary operator has determinant with absolute value 1 


Suppose V is an inner product space and S € £(V) is a unitary operator. 
Then |det S| = 1. 


Proof Because S is unitary, I = S*S (see 7.53). Thus 
1 = det(S*S) = (det S*)(detS) = (det S)(det S) = |det S/*, 


where the second equality comes from 9.49(a) and the third equality comes from 
9.56(c). The equation above implies that |det S| = 1. 


The determinant of a positive operator on an inner product space meshes well 
with the analogy that such operators correspond to the nonnegative real numbers. 


9.59 every positive operator has nonnegative determinant 


Suppose V is an inner product space and T € L(V) is a positive operator. 
Then det T > 0. 


Proof By the spectral theorem (7.29 or 7.31), V has an orthonormal basis con- 
sisting of eigenvectors of T. Thus by the last bullet point of 9.42, det T equals a 
product of the eigenvalues of T, possibly with repetitions. Each eigenvalue of T is 
a nonnegative number (by 7.38). Thus we conclude that det T > 0. 


Suppose V is an inner product space and T € £(V). Recall that the list of 
nonnegative square roots of the eigenvalues of T*T (each included as many times 
as its multiplicity) is called the list of singular values of T (see Section 7E). 


9.60 |det T| = product of singular values of T 


Suppose V is an inner product space and T € £(V). Then 


|det T| = ,/det(T*T) = product of singular values of T. 


Proof We have 
\det T? = (detT) (det T) = (det(T*) ) (det T) = det(T*T), 


where the middle equality comes from 9.56(c) and the last equality comes from 
9.49(a). Taking square roots of both sides of the equation above shows that 
|det T| = \/det(T*T). 

Let sj,...,8,, denote the list of singular values of T. Thus s7,...,s,? is the 
list of eigenvalues of T*T (with appropriate repetitions), corresponding to an 
orthonormal basis of V consisting of eigenvectors of T*T. Hence the last bullet 
point of 9.42 implies that 

det(T*T) = s?---s,2 


Thus |det T| = s,---s,,, as desired. 


n? 
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An operator T on a real inner product space changes volume by a factor of the 
product of the singular values (by 7.111). Thus the next result follows immediately 
from 7.111 and 9.60. This result explains why the absolute value of a determinant 
appears in the change of variables formula in multivariable calculus. 


9.61 T changes volume by factor of \det T| 


Suppose T € £(R”) and O C R” Then 


volume T(Q) = |det T|(volume Q). 


For operators on finite-dimensional complex vector spaces, we now connect 
the determinant to a polynomial that we have previously seen. 


9.62 if F =C , then characteristic polynomial of T equals det(zI — T) 


Suppose F = CandT € £(V). Let A, ...,A,,, denote the distinct eigenvalues 
of T, and let d,,...,d,,, denote their multiplicities. Then 


det(zl — T) = (z-— Ay) f(z = Am) om 


Proof There exists a basis of V with respect to which T has an upper-triangular 
matrix with each A, appearing on the diagonal exactly d, times (by 8.37). With 
respect to this basis, z] — T has an upper-triangular matrix with z — A, appearing 
on the diagonal exactly d, times for each k. Thus 9.48 gives the desired equation. 


Suppose F = C and T € £(V). The characteristic polynomial of T was 
defined in 8.26 as the polynomial on the right side of the equation in 9.62. We 
did not previously define the characteristic polynomial of an operator on a finite- 
dimensional real vector space because such operators may have no eigenvalues, 
making a definition using the right side of the equation in 9.62 inappropriate. 

We now present a new definition of the characteristic polynomial, motivated 
by 9.62. This new definition is valid for both real and complex vector spaces. 
The equation in 9.62 shows that this new definition is equivalent to our previous 
definition when F = C (8.26). 


9.63 definition: characteristic polynomial 


Suppose T € £(V). The polynomial defined by 


z+ det(zI — T) 


is called the characteristic polynomial of T. 


The formula in 9.46 shows that the characteristic polynomial of an opera- 
tor T € L(V) is a monic polynomial of degree dim V. The zeros in F of the 
characteristic polynomial of T are exactly the eigenvalues of T (by 9.51). 
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Previously we proved the Cayley—Hamilton theorem (8.29) in the complex 
case. Now we can extend that result to operators on real vector spaces. 


9.64 Cayley—Hamilton theorem 


Suppose T € £(V) and q is the characteristic polynomial of T. Then q(T) = 0. 


Proof IfF =C, then the equation q(T) = 0 follows from 9.62 and 8.29. 

Now suppose F = R. Fix a basis of V, and let A be the matrix of T with 
respect to this basis. Let S be the operator on C“™Y such that the matrix of S 
(with respect to the standard basis of C4'™Y) is A. For all z € R we have 


q(z) = det(zI — T) = det(zI — A) = det(zI — S). 


Thus q is the characteristic polynomial of S. The case F = C (first sentence of 
this proof) now implies that 0 = q(S) = q(A) = q(T). 


The Cayley—Hamilton theorem (9.64) implies that the characteristic polyno- 
mial of an operator T € £(V) is a polynomial multiple of the minimal polynomial 
of T (by 5.29). Thus if the degree of the minimal polynomial of T equals dim V, 
then the characteristic polynomial of T equals the minimal polynomial of T. This 
happens for a very large percentage of operators, including over 99.999% of 
4-by-4 matrices with integer entries in [—100, 100] (see the paragraph following 
3.28). 

The last sentence in our next result was previously proved in the complex case 
(see 8.54). Now we can give a proof that works on both real and complex vector 
spaces. 


9.65 characteristic polynomial, trace, and determinant 


Suppose T € £(V). Let n = dim V. Then the characteristic polynomial of T 
can be written as 


Zz” — (tr T)z"—! + --: + (=1)"(detT). 


Proof The constant term of a polynomial function of z is the value of the poly- 
nomial when z = 0. Thus the constant term of the characteristic polynomial of T 
equals det(—T), which equals (—1)” det T (by the third bullet point of 9.42). 

Fix a basis of V, and let A be the matrix of T with respect to this basis. The 
matrix of z] — T with respect to this basis is zl — A. The term coming from the 
identity permutation {1, ...,1} in the formula 9.46 for det(zI — A) is 


(Z- Ayia) @ ~ Bai) 


The coefficient of z”~* in the expression above is —(A,_,+++-+A,,,), Which equals 


—trT. The terms in the formula for det(z] — A) coming from other elements of 
perm n contain at most n — 2 factors of the form z— A, , and thus do not contribute 
to the coefficient of z”~! in the characteristic polynomial of T. 
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In the result below, think of the The next result was proved by Jacques 
columns of the n-by-n matrix A as ele- — adamard (1865-1963) in 1893. 
ments of F”. The norms appearing below 
then arise from the standard inner product on F”. Recall that the notation R. ; 
the proof below means the k column of the matrix R (as was defined in 3. iy, 


9.66 Hadamard’s inequality 


Suppose A is an n-by-n matrix. Let v,, ...,v,, denote the columns of A. Then 


1 
Idet Al < | | liogll. 


k= 1 


Proof If A is not invertible, then detA = 0 and hence the desired inequality 
holds in this case. 

Thus assume that A is invertible. The QR factorization (7.58) tells us that 
there exist a unitary matrix Q and an upper-triangular matrix R whose diagonal 
contains only positive numbers such that A = QR. We have 


|det A| = |det Q| |det R| 
= |det R| 


n 
= I] rll, 


where the first line comes from 9.49(b), the second line comes from 9.58, the 
third line comes from 9.48, and the fifth line holds because Q is an isometry. 


To give a geometric interpretation to Hadamard’s inequality, suppose F = R. 
Let T € £(R") be the operator such that Te, = v, for each k = 1,...,n, where 
€1,---,€, is the standard basis of R”. Then T maps the box P(e, ...,e,,) onto the 
parallelepiped P(v,,...,v,,) [see 7.102 and 7.105 for a review of this notation 
and terminology]. Because the box P(e,,...,e,,) has volume 1, this implies (by 
9.61) that the parallelepiped P(v,,...,v,,) has volume |det T|, which equals |det A]. 
Thus Hadamard’s inequality above can be interpreted to say that among all paral- 
lelepipeds whose edges have lengths ||v|I, ..., |Iv,,|], the ones with largest volume 
have orthogonal edges (and thus have volume Lk ‘ loll). 

For a necessary and sufficient condition for Hadamard’s inequality to be an 
equality, see Exercise 18. 


366 Chapter 9 Multilinear Algebra and Determinants 


The matrix in the next result is called the Vandermonde matrix. Vandermonde 
matrices have important applications in polynomial interpolation, the discrete 
Fourier transform, and other areas of mathematics. The proof of the next result is 
a nice illustration of the power of switching between matrices and linear maps. 


9.67 determinant of Vandermonde matrix 


Suppose n > 1 and f),...,8,, € F. Then 
1p. Br 


1 Bo Bs 


Proof Let1,z,...,z”~! be the standard basis of ?,,_ , (F) and let e,,..., e,, denote 
the standard basis of F”. Define a linear map S: ?,,_,(F) > F” by 


Sp = (p(B1), nsey (Bid J 


Let A denote the Vandermonde matrix shown in the statement of this result. 

Note that 
AS INS. ee) ein) 

Let T: P,,_,(F) — P,,_1(F) be the operator on ?,,_, (F) such that T1 = 1 

and 
Tzk = (z — By)(z — By)---(Z — By) 

fork = 1,40 = 1; Let BS Ot Ts ec ee), Then Bis an 
upper-triangular matrix all of whose diagonal entries equal 1. Thus det B = 1 (by 
9.48). 

Let C = M(ST, (1,z,...,2"~1), (€1, ---5@,)). Thus C = AB (by 3.81), which 
implies that 

det A = (det A) (det B) = detC. 


The definitions of C, S, and T show that C equals 


1 0 0 whe 0 
1 By — By 0 #36 0 
1 By—B, (B83 — B1)(B3 — Bz) * 0 


1 Bo — By (Bs — B1)(B3 — Bo) om (Ba = Pid Pa = Por Ba Pa) 
Now det A = detC = I] (B. - B;); where we have used 9.56(a) and 9.48. 


l1<j<k<n 
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Exercises 9C 


10 


11 


Prove or give a counterexample: S,T € £(V) == det(S+T) = det S+det T. 


Suppose the first column of a square matrix A consists of all zeros except 
possibly the first entry A, ,. Let B be the matrix obtained from A by deleting 
the first row and the first column of A. Show that det A = A, ; det B. 


Suppose T € £(V) is nilpotent. Prove that det(J + T) = 1. 
Suppose S € L(V). Prove that S is unitary if and only if |det S| = ||S|| = 1. 


Suppose A is a block upper-triangular matrix 


Ay * 
A= ts, ; 
0 Ay 


where each A, along the diagonal is a square matrix. Prove that 
det A = (det A,)---(detA,,,). 


Suppose A=( v,; ++ 2%, ) is ann-by-n matrix, with v, denoting the k™ 
column of A. Show that if (7, ...,m,,) € permn, then 


det( Um, Um, ) = (sign(my,...,m,)) det A. 


Suppose T € £(V) is invertible. Let p denote the characteristic polynomial 
of T and let q denote the characteristic polynomial of T~!. Prove that 


r= Fenn 


for all nonzero z € F. 


Suppose T € £(V) is an operator with no eigenvalues (which implies that 
F = R). Prove that det T > 0. 


Suppose that V is a real vector space of even dimension, T € £(V), and 
det T < 0. Prove that T has at least two distinct eigenvalues. 


Suppose V is a real vector space of odd dimension and T € £(V). Without 
using the minimal polynomial, prove that T has an eigenvalue. 


This result was previously proved without using determinants or the charac- 
teristic polynomial—see 5.34. 


Prove or give a counterexample: If F = R, T € £(V), and det T > 0, then 
T has a square root. 


IfF=C,T € L(V), and det V # 0, then V has a square root (see 8.41). 
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12 


13 


14 


15 


16 


17 


18 


19 
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Suppose S,T € £(V) and S is invertible. Define p: F — F by 
p(z) = det(zS — T). 


Prove that p is a polynomial of degree dim V and that the coefficient of z4'™Y 


in this polynomial is det S. 


Suppose F = C, T € £(V), andn = dimV > 2. Let Aj,...,A,, denote 

the eigenvalues of T, with each eigenvalue included as many times as its 

multiplicity. 

(a) Find a formula for the coefficient of z”~? in the characteristic polynomial 
of T in terms of A4,...,A,. 

(b) Find a formula for the coefficient of z in the characteristic polynomial 
of T in terms of A4,...,A,. 


Suppose V is an inner product space and T is a positive operator on V. Prove 


that 
det /T = Vdet T. 


Suppose V is an inner product space and T € £(V). Use the polar decom- 
position to give a proof that 


|det T| = \/det(T*T) 


that is different from the proof given earlier (see 9.60). 


Suppose T € £(V). Define g: F > F by g(x) = det(I + xT). Show that 
g (0) =trT. 
Look for a clean solution to this exercise, without using the explicit but 
complicated formula for the determinant of a matrix. 


Suppose a, b,c are positive numbers. Find the volume of the ellipsoid 


7 ey 2 
{iny.z) ER 54 5+5<i} 

by finding a set Q C R° whose volume you know and an operator T on R? 

such that T(Q) equals the ellipsoid above. 


Suppose that A is an invertible square matrix. Prove that Hadamard’s 
inequality (9.66) is an equality if and only if each column of A is orthogonal 
to the other columns. 


Suppose V is an inner product space, ¢j,...,¢,, is an orthonormal basis of V, 

and T € £(V) is a positive operator. 

(a) Prove that detT < []/_ (Te, e)- 

(b) Prove that if T is invertible, then the inequality in (a) is an equality if 
and only if e, is an eigenvector of T for each k = 1,...,n. 
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20 Suppose A is an n-by-n matrix, and suppose c is such that |A; ;| < c for all 
jk € {1,...,2}. Prove that 


|det A| < c@n"/2 


The formula for the determinant of a matrix (9.46) shows that \det A| < c”n!. 
However, the estimate given by this exercise is much better. For example, if 
c = 1andn = 100, thenc"n! ~ 108 but the estimate given by this exercise 
is the much smaller number 10! If n is an integer power of 2, then the 
inequality above is sharp and cannot be improved. 


21 Suppose n is a positive integer and 6: C’” — C is a function such that 
5(AB) = 6(A) - 6(B) 


for all A,B € C”” and 6(A) equals the product of the diagonal entries of A 
for each diagonal matrix A € C””. Prove that 


6(A) = detA 


for all A EC". 


Recall that C"" denotes set of n-by-n matrices with entries in C. This 
exercise shows that the determinant is the unique function defined on square 
matrices that is multiplicative and has the desired behavior on diagonal 
matrices. This result is analogous to Exercise 10 in Section 8D, which 
shows that the trace is uniquely determined by its algebraic properties. 


I find that in my own elementary lectures, I have, for pedagogical reasons, pushed 
determinants more and more into the background. Too often I have had the expe- 
rience that, while the students acquired facility with the formulas, which are so 
useful in abbreviating long expressions, they often failed to gain familiarity with 
their meaning, and skill in manipulation prevented the student from going into all 
the details of the subject and so gaining a mastery. 


—Elementary Mathematics from an Advanced Standpoint: Geometry, Felix Klein 


370 Chapter 9 Multilinear Algebra and Determinants 
9D Tensor Products 


Tensor Product of Two Vector Spaces 


The motivation for our next topic comes from wanting to form the product of 
a vector v € V and a vector w € W. This product will be denoted by v ® w, 
pronounced “v tensor w’, and will be an element of some new vector space called 
V ® W (also pronounced “V tensor W”). 

We already have a vector space Vx W (see Section 3E), called the product of 
V and W. However, Vx W will not serve our purposes here because it does not 
provide a natural way to multiply an element of V by an element of W. We would 
like our tensor product to satisfy some of the usual properties of multiplication. 
For example, we would like the distributive property to be satisfied, meaning that 
if v,,V.,v € Vand w,, w,,w € W, then 


(01 + Vz) @W= 0, QWt+7Ow and VO (Wy, + Wz) =VOW, +VOW>. 


We would also like scalar multiplica- 
tion to interact well with this new multi- 
plication, meaning that 


To produce ® in Tfx, type \otimes. 


A(v @ Ww) = (Av) @w =0@ (Aw) 


for all A EG F,v € V, andw € W. 

Furthermore, it would be nice if each basis of V when combined with each 
basis of W produced a basis of V@ W. Specifically, if e,,...,e,, is a basis of V 
and f,,..., f, is a basis of W, then we would like a list (in any order) consisting 
of e; ® f,, as j ranges from 1 to m and k ranges from 1 to 1, to be a basis of 
V@ W. This implies that dim(V@ W) should equal (dim V)(dim W). Recall that 
dim(V x W) = dim V + dim W (see 3.92), which shows that the product V x W 
will not serve our purposes here. 

To produce a vector space whose dimension is (dim V) (dim W) in a natural 
fashion from V and W, we look at the vector space of bilinear functionals, as 
defined below. 


9.68 definition: bilinear functional on V x W, the vector space B(V, W) 


e A bilinear functional on V x W is a function B: Vx W — F such that 


v + B(v,w) is a linear functional on V for each w € W and w & B(v, w) 
is a linear functional on W for each v € V. 


e The vector space of bilinear functionals on Vx W is denoted by B(V, W). 


If W = V, then a bilinear functional on V x W is a bilinear form; see 9.1. 

The operations of addition and scalar multiplication on B(V, W) are defined 
to be the usual operations of addition and scalar multiplication of functions. As 
you can verify, these operations make B(V, W) into a vector space whose additive 
identity is the zero function from V x W to F. 
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9.69 example: bilinear functionals 


e Suppose gy € V’ and t € W’. Define 6: Vx W = F by B(v, w) = g(v) Tw). 
Then £ is a bilinear functional on Vx W. 


e Suppose v € Vand w € W. Define 6: V’x W' > F by f(g, T) = g(v) Tw). 
Then £ is a bilinear functional on V’x W’, 

e Define 6: Vx V’ = F by B(v, ¢) = vv). Then £ is a bilinear functional on 
VxV" 


e Suppose g € V’. Define 6: Vx L(V) > F by B(v, T) = p(Tv). Then B is a 
bilinear functional on Vx L(V). 


e Suppose m and 1 are positive integers. Define 6: F"’"xF”™ — F by f(A, B) = 
tr(AB). Then £ is a bilinear functional on F””" x F””. 


Proof Lete,...,e,, be a basis of V and f,,..., f,, be a basis of W. For a bilinear 

functional 6 € B(V,W), let M(B) be the m-by-n matrix whose entry in row j, 

column k is Ble}. fy). The map 8B +» M(B) is a linear map of B(V, W) into F””. 
For a matrix C € F””, define a bilinear functional Bc on Vx W by 


n m 
Boley rated nlm Oy fy tte ak bn fn) = 2 ae Cj jx 
=1j= 
fOr Ay, 5A, 01, +50, © F. 

The linear map 6 + M(f) from B(V, W) to F”’” and the linear map C & Bc 
from F”"" to B(V,W) are inverses of each other because By¢.g) = f for all 
B © BV, W) and M(B-) = C for all C € F”’”, as you should verify. 

Thus both maps are isomorphisms and the two spaces that they connect have the 
same dimension. Hence dim B(V, W) = dim F”"" = mn = (dim V)(dim W). 


Several different definitions of V@ W appear in the mathematical literature. 
These definitions are equivalent to each other, at least in the finite-dimensional 
context, because any two vector spaces of the same dimension are isomorphic. 

The result above states that B(V, W) has the dimension that we seek, as do 
L(V, W) and F4™Y.dimW Thus it may be tempting to define V@ W to be B(V, W) 
or £(V,W) or F4mV.dimW However, none of those definitions would lead to a 
basis-free definition of v @ w forv € Vandw e€ W. 

The following definition, while it may seem a bit strange and abstract at first, 
has the huge advantage that it defines v @ w in a basis-free fashion. We define 
V ® W to be the vector space of bilinear functionals on V’ x W' instead of the 
more tempting choice of the vector space of bilinear functionals on V x W. 
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9.71 definition: tensor product, V@ W, v @w 


e The tensor product V ® W is defined to be B(V’, W’). 


e Forv € Vand w & W, the tensor product v @ w is the element of V@ W 
defined by 


(VU @w)(9,T) = P(v)T(w) 
for all (gp, tT) € V’x W’. 


We can quickly prove that the definition of V@ W gives it the desired dimension. 


9.72 dimension of the tensor product of two vector spaces 


dim(V @ W) = (dim V) (dim W). 


Proof Because a vector space and its dual have the same dimension (by 3.111), 
we have dim V’ = dimV and dimW’ = dimW. Thus 9.70 tells us that the 
dimension of B(V’, W’) equals (dim V) (dim W). 


To understand the definition of the tensor product v @ w of two vectors v € V 
and w € W, focus on the kind of object it is. An element of V @ W is a bilinear 
functional on V’x W’, and in particular it is a function from V’x W’ to F. Thus for 
each element of V’x W’, it should produce an element of F. The definition above 
has this behavior, because v ® w applied to a typical element (gy, T) of V’x W’ 
produces the number g(v)T(w). 

The somewhat abstract nature of v @ w should not matter. The important point 
is the behavior of these objects. The next result shows that tensor products of 
vectors have the desired bilinearity properties. 


9.73 bilinearity of tensor product 


Suppose 2, 01,02 € V and w, w,,w. € Wand A € F. Then 


(0, +02) @W=0, O@Wt+7,8wW and V@(wW,+Wo) =VOW, +VBWwW> 


and 


A(v ® WwW) = (Av) @W=7® (Aw). 


Proof Suppose (g,T) € V’x W’. Then 


((0, + Vz) @ W)(P, T) = P(V, + V2)T(W) 
= (01) TW) + P(02) TW) 
= (V1 ® W)(Q, T) + (Vz @ W)(Q, T) 
= (01 @W+ Vy @W)(Q,T). 


Thus (0, + J7) @W = 0, BW+ U7 OW. 
The other two equalities are proved similarly. 
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Lists are, by definition, ordered. The order matters when, for example, we 
form the matrix of an operator with respect to a basis. For lists in this section 
with two indices, such as {e; ® fi}j=1,....m;k=1,....n in the next result, the ordering 
does not matter and we do not specify it—just choose any convenient ordering. 

The linear independence of elements of V ® W in (a) of the result below 
captures the idea that there are no relationships among vectors in V @ W other 
than the relationships that come from bilinearity of the tensor product (see 9.73) 
and the relationships that may be present due to linear dependence of a list of 
vectors in V or a list of vectors in W. 


9.74 basis of V®W 


Suppose é,, ..., @,, is a list of vectors in V and f,,..., f,, is a list of vectors in W. 


(a) Ife,,...,e,, and f,,..., f,, are both linearly independent lists, then 


{e; ® ie See ea 


is a linearly independent list in V@ W. 


(b) If e,,...,e,, is a basis of V and f,,..., f,, is a basis of W, then the list 
{e; ® fi}j=1,....m;k=1,..,n 18 a basis of V@ W. 


Proof ‘To prove (a), suppose e,,...,é,,, and f;,..., f,, are both linearly independent 
lists. This linear independence and the linear map lemma (3.4) imply that there 
exist P1,--.5 Pj, © V' and T,...,T,, € W’ such that 

1 ifj=k, 
0 ifj#k 


1 ifj=k, 
0 ifj#k, 
where j,k € {1,...,m} in the first equation and j,k € {1,...,1} in the second 


equation. 
Suppose {4; 4}; =1,....m:k=1,....n 18 a list of scalars such that 


9.75 » x a x(e; ® fy) = 0. 


k=1j=1 


Pj(Cx) = | and T;( fx) = | 


Note that (e, ® fi) (Pu> Tn) equals 1 if 7 = M andk = N, and equals 0 otherwise. 
Thus applying both sides of 9.75 to (pjy, Ty) Shows that ay, yj, = 0, proving that 
{e; ® fx}j=1.,....m;k=1,...,n iS linearly independent. 

Now (b) follows from (a), the equation dim V@ W = (dim V) (dim W) [see 
9.72], and the result that a linearly independent list of the right length is a basis 
(see 2.38). 


Every element of Vx W is a finite sum of elements of the form v ® w, where 
v € Vand w &€ W, as implied by (b) in the result above. However, if dim V > 1 
and dim W > 1, then Exercise 4 shows that 


{v@w: (v,w) EVx W} 4 VOW. 
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9.76 example: tensor product of element of F™ with element of F" 


Suppose m and n are positive integers. Let e;,...,e,,, denote the standard basis 
of F” and let f,,..., f, denote the standard basis of F”. Suppose 


UV = (U4,-,0,) EF" and w= (wWy,...,W,) € F’. 


Then 


m n 


vew=( 16) @(Y wf) 


j=l k=1 
= Vi DY (oy, (6; ® fx) 


Thus with respect to the basis {e; ® fi}j—1,...m:k=1,...n Of F” @ F” provided 
by 9.74(b), the coefficients of v ® w are the numbers {0,;t;})=4,...m:k =1,....n- If 
instead of writing these numbers in a list, we write them in an m-by-n matrix with 


v;w, in row j, column k, then we can identify v ® w with the m-by-n matrix 


OyW, + 0yWy, 


OyWy 0 On Wy 


See Exercises 5 and 6 for practice in using the identification from the example 
above. 

We now define bilinear maps, which differ from bilinear functionals in that 
the target space can be an arbitrary vector space rather than just the scalar field. 


9.77 definition: bilinear map 


A bilinear map from V x W to a vector space U is a function TP: Vx W > U 


such that v » I'(v,w) is a linear map from V to U for each w € W and 
w+ [(v,w) is a linear map from W to U for each v € V. 


9.78 example: bilinear maps 


e Every bilinear functional on Vx W is a bilinear map from Vx W to F. 


e The function: Vx W > V@ W defined by ['(v, w) = v @ wis a bilinear map 
from Vx W to V@ W (by 9.73). 

e The functionT: Z(V) x L(V) > L(V) defined by ['(S,T) = ST is a bilinear 
map from £(V) x £(V) to L(V). 

e The function TP: Vx L(V,W) — W defined by ['(v, T) = Tv is a bilinear map 
from Vx L(V,W) to W. 
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Tensor products allow us to convert bilinear maps on Vx W into linear maps on 
V@® W (and vice versa), as shown by the next result. In the mathematical literature, 
(a) of the result below is called the “universal property” of tensor products. 


9.79 converting bilinear maps to linear maps 


Suppose U is a vector space. 


(a) Suppose [: Vx W = U isa bilinear map. Then there exists a unique 
linear map T: V@ W — U such that 


Tw ®w) =T(v, w) 


for all (v,w) E Vx W. 


(b) Conversely, suppose T: V@ W — Uisa linear map. There there exists a 
unique bilinear map T*: Vx W > U such that 


T*(v,w) = T(v ® w) 


for all (v,w) EC Vx Wz 


Proof Letej,...,e,, be a basis of V and let f,,..., f,, be a basis of W. By the linear 
map lemma (3.4) and 9.74(b), there exists a unique linear map Tr: Veawou 
such that . 
P(e ® fr) =TG, fx) 
for all j € {1,...,m} andk € {1,...,n}. 
Now suppose (v,w) € Vx W. There exist a1, ...,4,,,,0,...,b,, © F such that 
v= aye, ++ +4,,e,, and w = b, f, +--+, f,,. Thus 


T(v@ WwW) = r( » > (a;bx) (€; @ fi)) 
k=1j=1 


=> >Y abTeE @ f) 


k=1j=1 


n m 


=>) YY abTe. fi) 


k=1j=1 
=T,w), 


as desired, where the second line holds because T is linear, the third line holds by 
the definition of Te and the fourth line holds because I’ is bilinear. 

The uniqueness of the linear map I satisfying ['(v ® w) = T'(v,w) follows 
from 9.74(b), completing the proof of (a). 

To prove (b), define a function T*: Vx W > U by T*(v,w) = T(v@w) for all 
(v,w) € Vx W. The bilinearity of the tensor product (see 9.73) and the linearity 
of T imply that T* is bilinear. 

Clearly the choice of T* that satisfies the conditions is unique. 
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To prove 9.79(a), we could not just define Tw @w) =T(v,w) for allv € V 
and w € W (and then extend I linearly to all of V@ W) because elements of 
V ® W do not have unique representations as finite sums of elements of the form 
v ® w. Our proof used a basis of V and a basis of W to get around this problem. 

Although our construction of T in the proof of 9.79(a) depended on a basis of 
V and a basis of W, the equation Tv @ w) =I (v,w) that holds for all v € V and 
w € Wshows that I’ does not depend on the choice of bases for V and W. 


Tensor Product of Inner Product Spaces 


The result below features three inner products—one on V @ W, one on V, and one 
on W, although we use the same symbol (-, -) for all three inner products. 


9.80 inner product on tensor product of two inner product spaces 


Suppose V and W are inner product spaces. Then there is a unique inner 
product on V® W such that 


(0 @ W,U @X) = (VU, u)(w, xX) 


for all v, u © V and w,x € W. 


Proof Suppose e,...,¢,, is an orthonormal basis of V and f;,..., f,, is an ortho- 
normal basis of W. Define an inner product on V@ W by 
n ™m n m n ™m 
9.81 (>. > bn @ fe >, cua ® fe) a ee 
k=1j=1 k=1j=1 k=1j=1 
The straightforward verification that 9.81 defines an inner product on Vx W 
is left to the reader [use 9.74(b) ]. 


Suppose that v,u € Vandw,x € W. Let v,...,v,, € F be such that 


VU = 0,e, + +++ + U,,€,,, With similar expressions for u, w, and x. Then 


m n m n 
(Vv @W,uU@x) = (5: V2,@ >). Wife >, Ue, ®@ >, wf) 
j=l k=1 j=l k=1 


n 


n m m 
(> a Vj, ® fr es UjXp C; ® hi) 
=1lj=1 =1 


k=1j 


I I 
Ms 

S 

=| 

Ss 

= 

- 


| 
ws 
res 
ll 3 
gt, 
NE 
aS 
ae 
I = 
ray 

S 

~ 

tad 

~ 
NY 


= (0, U)(w,X). 


There is only one inner product on V@ W such that (v@w, u@x) = (v, u)(w, x) 
for all v, u € V and w, x € W because every element of V@ W can be written as 
a linear combination of elements of the form v ® w [by 9.74(b)]. 
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The definition below of a natural inner product on V @ W is now justified by 
9.80. We could not have simply defined (v ® w, u @ x) to be (v, u)(w, x) (and then 
used additivity in each slot separately to extend the definition to V@ W) without 
some proof because elements of V @ W do not have unique representations as 
finite sums of elements of the form v @ w. 


9.82 definition: inner product on tensor product of two inner product spaces 


Suppose V and W are inner product spaces. The inner product on V @ W is 
the unique function (-, -) from (V@ W) x (V@ W) to F such that 


(0 @ W,u @xX) = (V,u)(w, xX) 


for all v,u € V and w,x € W. 


Take u = v and x = w in the equation above and then take square roots to 
show that 
lo ® wll = lal llewll 


for all v € V and all w € W. 

The construction of the inner product in the proof of 9.80 depended on an 
orthonormal basis ¢,, ..., €,,, of V and an orthonormal basis f,, ..., f,, of W. Formula 
9.81 implies that {e; ® f}j=1,....m;k=1,....n 18 a doubly indexed orthonormal list in 
V@ Wand hence is an orthonormal basis of V@ W [by 9.74(b)]. The importance 
of the next result arises because the orthonormal bases used there can be different 
from the orthonormal bases used to define the inner product in 9.80. Although 
the notation for the bases is the same in the proof of 9.80 and in the result below, 
think of them as two different sets of orthonormal bases. 


9.83 orthonormal basis of V® W 


Suppose V and W are inner product spaces, and e,,..., e,,, is an orthonormal 
basis of V and f,,..., f, is an orthonormal basis of W. Then 


{e, ® faljat,..smik=1 


is an orthonormal basis of V@ W. 


Proof We know that {e; ® fi}j=1,....m:k=1,...n 18 a basis of V® W [by 9.74(b)]. 
Thus we only need to verify orthonormality. To do this, suppose j, M € {1,...,m} 
and k,N € {1,...,2}. Then 


( ® frsen ® fur) = (een) fe fu) = | 


Hence the doubly indexed list {e; ® fq}; =1,....m:k=1,....n 18 indeed an orthonormal 
basis of V@ W. 


1 ifj=Nandk=M, 


0 otherwise. 


See Exercise 11 for an example of how the inner product structure on V@ W 
interacts with operators on V and W. 
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Tensor Product of Multiple Vector Spaces 


We have been discussing properties of the tensor product of two finite-dimensional 
vector spaces. Now we turn our attention to the tensor product of multiple finite- 
dimensional vector spaces. This generalization requires no new ideas, only some 
slightly more complicated notation. Readers with a good understanding of the 
tensor product of two vector spaces should be able to make the extension to the 
tensor product of more than two vector spaces. 

Thus in this subsection, no proofs will be provided. The definitions and the 
statements of results that will be provided should be enough information to enable 
readers to fill in the details, using what has already been learned about the tensor 
product of two vector spaces. 

We begin with the following notational assumption. 


9.84 notation: Vj,..., V,,, 


For the rest of this subsection, m denotes an integer greater than 1 and 
V,,..., V,, denote finite-dimensional vector spaces. 


The notion of an m-linear functional, which we are about to define, generalizes 
the notion of a bilinear functional (see 9.68). Recall that the use of the word 
“functional” indicates that we are mapping into the scalar field F. Recall also that 
the terminology “m-linear form” is used in the special case V, = --- = V,,, (see 
9.25). The notation B(V,,..., V,,,) generalizes our previous notation B(V, W). 


9.85 definition: m-linear functional, the vector space B(Vj, ...; Vin) 


e An m-linear functional on V, x --. x V,, is afunction B: V,; x---x V,, > F 
that is a linear functional in each slot when the other slots are held fixed. 


e The vector space of m-linear functionals on V, x --- x V,,, is denoted by 
BQVsRee V0) 


9.86 example: m-linear functional 


Suppose gy, € (V,)’ for each k € {1,...,m}. Define 6: V, x --- x V,,, > F by 


B(O4, +665 Om) = Py (1) XX Dy (Op,)- 
Then £ is an m-linear functional on V, x --- x V,,,. 


The next result can be proved by imitating the proof of 9.70. 


9.87 dimension of the vector space of m-linear functionals 


dim B(V,,..., V,,) = (dim V,) x --- x (dim V,,,). 
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Now we can define the tensor product of multiple vector spaces and the tensor 
product of elements of those vector spaces. The following definition is completely 
analogous to our previous definition (9.71) in the case m = 2. 


9.88 definition: tensor product, V, ® ++: ® Vj, 01 @ ++ @ Vy 


e The tensor product V, ® --- ® V,, is defined to be B(Vy,..., Vj, )- 


e For v, € Vj,..., 0, € V,,, the tensor product v1; ® --- ® V,, is the element 
of V, ® --- @ V,,, defined by 


(01 BBO) (P15 +2 Pn) = P11)" Pm Om) 
for ain Vy Kok Ve 


The next result can be proved by following the pattern of the proof of the 
analogous result when m = 2 (see 9.72). 


9.89 dimension of the tensor product 


dim(V, ® --- ® V,,) = (dim V,)---(dim V,,,). 


Our next result generalizes 9.74. 


9.90 basis of V, ®- ® V,, 


Suppose dim V, = ny, and ef,...,e4 is a basis of V; fork = 1,...,m. Then 


al m 
(ej, @-@ al 


is a basis of V, @ -:- @ V,,. 

Suppose m = 2 and ¢/,...,¢;, is a basis of V, and ¢7,...,e;, is a basis of V>. 
Then with respect to the basis {e;, ® ef }i, =1...1y3/2=1,..n, i the result above, the 
coefficients of an element of V, @ V, can be represented by an 1, -by-n, matrix that 
contains the coefficient of e;, ® ee. in row j,, column jy. Thus we need a matrix, 
which is an array specified by two indices, to represent an element of V; ® Vp. 

If m > 2, then the result above shows that we need an array specified by m 
indices to represent an arbitrary element of V, @ --- @ V,,. Thus tensor products 
may appear when we deal with objects specified by arrays with multiple indices. 

The next definition generalizes the notion of a bilinear map (see 9.77). As 
with bilinear maps, the target space can be an arbitrary vector space. 


9.91. definition: m-linear map 


An m-linear map from V, x --- x V,, to a vector space U is a function 


T: V, x---x V,, > U that is a linear map in each slot when the other slots are 
held fixed. 
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The next result can be proved by following the pattern of the proof of 9.79. 


9.92 converting m-linear maps to linear maps 


Suppose U is a vector space. 


(a) Suppose that: V, x --- x V,, > U is an m-linear map. Then there exists 
a unique linear map T°: V; @ --- @ V,,, ~ U such that 


Tv, @ + @V,,) = 104, «.-5 Um) 


for all (v1, ...,0,,) € Vy x + x Vip. 


(b) Conversely, suppose T: V, ® --- @ V,,, — Uis a linear map. There there 
exists a unique m-linear map T*: V, x --- x V,,, > U such that 


T'Oy,2402) = 0 (0) @-1@102) 


for all (v1, ...,0,,) € Vy x +++ x Vip. 


See Exercises 12 and 13 for tensor products of multiple inner product spaces. 


Exercises 9D 


1 Suppose v € Vand w € W. Prove that v @ w = Oif and only if v = 0 or 
w=0. 
2 Give an example of six distinct vectors 01,02, 03, W1, W2, W3 in R? such that 
V1 @ Wy + Vp @ Wy + V3 @ Wz =0 


but none of v, @ W,, Vz ® Wz, Vz @ W3 is a scalar multiple of another element 
of this list. 


3 Suppose that v,,...,v,,, is a linearly independent list in V. Suppose also that 


ip Ory 
W4,..-5 W,, is a list in W such that 


0, OW, +++ +0, @W,, = 0. 
Prove that w,; = + = w,, = 0. 
4 Suppose dim V > 1 and dimW > 1. Prove that 
{v@w: (v,w) € Vx W} 


is not a subspace of V @ W. 
This exercise implies that if dim V > 1 and dimW > 1, then 


{v@w:(v,w) Ee Vx W} + VOW. 


10 


11 


12 
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Suppose m and n are positive integers. For v € F” and w € F", identify 
v ® w with an m-by-n matrix as in Example 9.76. With that identification, 
show that the set 

{v@w:v € F" andweF"} 


is the set of m-by-n matrices (with entries in F) that have rank at most one. 


Suppose m and n are positive integers. Give a description, analogous to 
Exercise 5, of the set of m-by-n matrices (with entries in F) that have rank 
at most two. 


Suppose dim V > 2 and dim W > 2. Prove that 
{V1 @ Wy + Vz @ Wy ? V1, 02 € Vandwi,w, EW} # VOW. 
Suppose v,...,U,, € V and wy, ..., w,, € W are such that 
01 @Wy ++ + VU, OW, = O. 


Suppose that U is a vector space andl: Vx W > Uis a bilinear map. Show 
that 
(01,61) + +P (0,,,W,,) = 0. 
Suppose S € £(V) and T € £(W). Prove that there exists a unique operator 
on V® W that takes v @ w to Su @ Tw for allv € V andw € W. 
In an abuse of notation, the operator on V ® W given by this exercise is 


often called S @ T. 


Suppose S € £(V) andT € L(W). Prove that S@T is an invertible operator 
on V® Wigf and only if both S and T are invertible operators. Also, prove 
that if both S and T are invertible operators, then (S @ T)~! = S-' @ T"}," 
where we are using the notation from the comment after Exercise 9. 


Suppose V and W are inner product spaces. Prove that if S € £(V) and 
T € L(W), then (S @T)* = S* @T*, where we are using the notation from 
the comment after Exercise 9. 


Suppose that V,, ..., V,,, are finite-dimensional inner product spaces. Prove 
that there is a unique inner product on V; ® --- ® V,,, such that 


(01 @ + @ Vy, Uy @ vO Uy) = (Vz, Uz) (On, Uy) 


for all (01,...,0,,) and (uy4,...,U,,) in V; x +--+ x V,,. 


Note that the equation above implies that 
01 ®:-®@ Vmll = Ilo41I Xo XK lO rnll 


for all (04, ...,04) € Vy xo X Vip 
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Suppose that V,,..., V,, are finite-dimensional inner product spaces and 
V, ® «+: ® V,,, is made into an inner product space using the inner product 
from Exercise 12. Suppose eX, ..., a is an orthonormal basis of V, for each 
k =1,...,m. Show that the list 


eer 


is an orthonormal basis of V, ® --- ® V,,,. 
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